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APPEAL BRIEF 



Sir: 



Appellants hereby submit an original and two copies of this Appeal Brief to the Board of Patent 
Appeals and Interferences ("the Board") in response to the Final Office Action mailed on 
December 2, 2002. The Notice of Appeal was timely submitted on March 3 1 , 2003, and was received 
in the Patent and Trademark Office ("the Office") on April 3, 2003. This Appeal Brief is timely submitted 
in light of the concurrently filed Petition for an Extension of Time of one month to and including 
July 3, 2003, and authorization to deduct the fee as required under 37 C.F.R. § 1.17(a)(1) from 
Appellants' Representatives' deposit account. The Commissioner is also authorized to charge the fee for 
filing this Appeal Brief ($160.00), as required under 37 C.F.R. § 1.17(c), to Lexicon Genetics 
Incorporated Deposit Account No. 50-0892. 

Appellants believe no fees in addition to the fee for filing the Appeal Brief and the fee for the 
extension of time arc due in connection with this Appeal Brief. However, should any additional fees under 
37C.F.R. §§ 1.16to 1.21 be required for any reason related to this communication, the Commissioner 
is authorized to charge any underpayment or credit any overpayment to Lexicon Genetics Incorporated 
Deposit Account No. 50-0892. 

L REAL PARTY IN INTEREST 

The real party in interest is the Assignee, Lexicon Genetics Incorporated, 8800 Technology Forest 
Place, The Woodlands, Texas, 77381. 

IL RELATED APPEALS AND INTERFERENCES 

Appellants know of no related appeals or interferences that will directly affect or be directly 
affected by or have a bearing on the Board's decision in the pending appeal. 
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III. STATUS OF THE CLAIMS 

The present application was filed on May 14, 2001, claiming the benefit of U.S. Provisional 
Application Number 60/205,275, which was filed on May 18, 2000, and included original claims 1-4. 
A First Official Action on the merits ("the First Action") was issued on December 3, 200 1 , in which claims 
1-3 were rejected under 35 U.S.C.§ 101 as allegedly lacking a patentable utility, claims 1-4 were rejected 
under 35 U.S.C. § 1 1 2, first paragraph, as allegedly unusable by the skilled artisan due to the alleged lack 
of patentable utility, claims 1-3 were variously rejected under 35 U.S.C. § 1 12, second paragraph, as 
allegedly indefinite, and claim 1 was rejected under 35 U.S.C. § 1 12, first paragraph, as allegedly lacking 
sufficient written description. In a response to the First Action submitted to the Office on March 1 , 2002 
("Response to the First Action"), Appellants amended claims 1-3 to further improve their clarity and 
addressed the rejections of claims 1-4. 

A Second and Non-Final Official Action on the merits ("the Second Action") was issued on May 
7, 2002, indicating that the rejection of claims 1 and 3 under 35 U.S.C. § 112, second paragraph, as 
allegedly indefinite had been overcome by the amendments and remarks submitted in the Response to the 
First Action, but maintaining the rejection of claims 1-4 under 35 U.S.C. § 101 as allegedly lacking a 
patentable utility, claims 1-4 under 35 U.S.C. § 112, first paragraph, as allegedly unusable by the skilled 
artisan due to the alleged lack of patentable utility, claim 2 under 35 U.S.C. § 1 1 2, second paragraph, as 
allegedly indefinite, and claim 1 under 35 U.S.C. § 1 12, first paragraph, as allegedly lacking sufficient 
written description, and newly rejecting claim 1 under 35 U.S.C. § 1 12, first paragraph, as allegedly not 
enabled. In a response to the Second Action submitted to the Office on September 5, 2002 ("Response 
to the Second Action"), Appellants added claims 5-8, and addressed the rejections of claims 1-4. 

A Third and Final Official Action ("the Final Action") was mailed on December 2, 2002, 
maintaining the rejection of claims 1-4 (and including newly added claims 5-8) under 35 U.S.C. § 101 as 
allegedly lacking a patentable utility, claims 1-4 (and including newly added claims 5-8) under 
35 U.S.C. § 1 12, first paragraph, as allegedly unusable by the skilled artisan due to the alleged lack of 
patentable utility, claim 2 under 35 U.S.C. § 112, second paragraph, as allegedly indefinite, claim 1 (and 
including newly added claims 5 and 8) under 35 U.S.C. § 112, first paragraph, as allegedly lacking 
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sufficient written description, and claim 1 (and including newly added claims 5 and 8) under 
35 U.S.C. § 1 12, first paragraph, as allegedly not enabled. In a response to the Final Action submitted 
on March 31, 2003 ("Response to the Final Action"), Appellants again addressed the rejections of 
claims 1-8. An Advisory Action ("the Advisory Action") was mailed on April 29, 2003, maintaining the 
rejection of claims 1-8 under 35 U.S.C. § 101 as allegedly lacking a patentable utility, claims 1-8 under 
35 U.S.C. § 1 12, first paragraph, as allegedly unusable by the skilled artisan due to the alleged lack of 
patentable utility, claim 2 under 35 U.S.C. § 112, second paragraph, as allegedly indefinite, claims 1,5, 
and 8 under 35 U.S.C. § 112, first paragraph, as allegedly lacking sufficient written description, and 
claims 1 , 5, and 8 under 35 U.S.C. § 1 12, first paragraph, as allegedly not enabled. Therefore, claims 1-8 
are the subject of this appeal. A copy of the appealed claims are included below in the Appendix 
(Section IX). 

IV. STATUS OF THE AMENDMENTS 

As no amendments subsequent to the Final Action have been filed, Appellants believe that no 
outstanding amendments exist. 

V. SUMMARY OF THE INVENTION 

The present invention relates to Appellants' discovery and identification of novel human 
polynucleotide sequences that encode a novel protein that shares structural similarity with mammalian 
proteases (specification at page 1 , lines 11-12), and particularly serine proteases (specification at page 2, 
lines 1-2). 

The presently claimed polynucleotide sequences were compiled from gene trapped cDNAs, in 
conjunction with cDNAs prepared from human brain, cerebellum, testis, kidney, skeletal muscle, thymus 
and salivary gland mRNAs (specification at page 3, lines 11-13). Two coding single nucleotide 
polymorphisms were identified in the claimed sequence - specifically, a G/A polymorphism at nucleotide 
position 343 of SEQ ID NO: 1 , which can result in a valine or isoleucine being present at corresponding 
amino acid position 115 of SEQ ID NO:2; and a C/T polymorphism at nucleotide position 868 of 
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SEQ ID NO: 1, which can result in a cysteine or arginine being present at corresponding amino acid 
position 290 of SEQ ID NO:2. 

The specification details a number of uses for the presently claimed polynucleotide sequences, 
including in diagnostic assays such as forensic analysis (see, for example, the specification at page 10, 
line 28), in identification of protein coding sequence and identification of exon splice junctions (see, for 
example, the specification at page 2, lines 23-25, and page 10, lines 28-33), in mapping the sequences to 
a specific region of a human chromosome (see, for example, the specification at page 2, lines 25-26), and 
in assessing gene expression patterns, particularly using a high throughput "chip" format (see, for example, 
the specification at page 5, lines 18-21). 

VI. ISSUES ON APPEAL 

1. Do claims 1-8 lack a patentable utility? 

2. Are claims 1-8 unusable by a skilled artisan due to a lack of patentable utility? 

3. Is claim 2 indefinite? 

4. Do claims 1, 5 and 8 lack sufficient written description? 

5. Are claims 1, 5 and 8 enabled? 

VII. GROUPING OF THE CLAIMS 

For the purposes of the outstanding rejections under 35 U.S.C. § 101 and35U.S.C. § 112, first 
paragraph, associated with the utility rejection, the claims will stand or fall together. For the purposes of 
the outstanding rejection under 35 U.S.C. § 112, second paragraph, claim 2 will stand orfall alone. For 
the purposes of the outstanding rejections under 35 U.S.C. § 1 12, first paragraph, associated with written 
description and enablement, claims 1, 5, and 8 will stand or fall together. 

VIII. ARGUMENT 

A. Do Claims 1-8 Lack a Patentable Utility? 

The Final Action first rejects claims 1-8 under 35 U.S.C. § 101, as allegedly lacking a patentable 
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utility due to not being supported by either a specific and substantial or a well-established utility. 

Appellants pointed out in the Response to the Final Action that the present nucleic acid sequences 
have utility in forensic analysis, as described in the specification as originally filed (see, for example, 
page 10,line28). As described in the specification from page 15, line 29 to page 16, line 2, the presently 
claimed sequence defines two coding single nucleotide polymorphisms - specifically, a G/A polymorphism 
at nucleotide position 343 of SEQ ID NO: 1 , which can result in a valine or isoleucine being present at 
corresponding amino acid position 1 1 5 of SEQ ID NO:2; and a C/T polymorphism at nucleotide position 
868 of SEQ ID NO: 1 , which can result in a cysteine or arginine being present at corresponding amino acid 
position 290 of SEQ ID NO:2. As such polymorphisms are the basis for forensic analysis, which in 
undoubtedly a "real world" utility, the presently claimed sequence must in itself be useful. 

The Advisory Action states that the use of the present sequences in forensic analysis is not a 
specific utility because "it is unclear to the Examiner as to how this can be a specific utility for the claimed 
polynucleotides absent an indication as to how these polymorphisms can be used to distinguish between 
one person from another" (the Advisory Action at page 3). Appellants respectfully point out that the 
presently described polymorphisms can be used by those skilled in the art to "distinguish between one 
person from another" simply based on the presence or absence of the described polymorphism. The 
Examiner has provided no evidence of record that establishes that skilled artisans would not be able to use 
the presently described polymorphisms in forensic analysis exactly as they were described in the 
specification as originally filed, without any additional research. It is important to note that simply because 
the use of these polymorphic markers will necessarily provide additional information on the percentage of 
particular subpopulations that contain these polymorphic markers does not mean that additional research 
is needed in order for these markers as they are presently described in the instant specification to be used 
in forensic science. Thus, the Examiner has failed to meet her evidentiary burden of proving that the present 
invention lacks utility. 

This is also not a case of a potential utility. In the response to the Final Action, Appellants pointed 
out that even in the worst case scenario, the described polymorphisms are each useful to distinguish 50% 
of the population (in other words, the marker being present in half of the population). The Advisory Action 



-5- 




states that "the Examiner has not been able to locate any support in the specification in regard to this 
assertion nor can the Examiner find any information in the art in regard to how one can use these specific 
polymorphisms in the claimed polynucleotides as a marker to distinguish 50% of the population" (the 
Advisory Action at page 3). First, Appellants point out that the ability of a polymorphic marker to 
distinguish at least 50% of the population is an inherent feature of any polymorphic marker, and this feature 
is well understood by those of skill in the art. Appellants note that as a matter of law, it is well settled that 
a patent need not disclose what is well known in the art. In re Wands, 8 USPQ 2d 1400 (Fed. Cir. 1988). 
Second, the assertion that the present claims lack utility because the Examiner cannot "find any information 
in the art in regard to how one can use these specific polymorphisms in the claimed polynucleotides as a 
marker to distinguish 50% of the population" (the Advisory Action at page 3, emphasis added) strains 
credulity. The Examiner seems to be suggesting that because the presently claimed sequences and 
polymorphisms are novel, that they cannot have a patentable utility. However, this is clearly not the 
standard under 35 U.S.C. § 101 . Appellants respectfully point out that all that is required to support 
Appellants assertion of utility is for the skilled artisan to believe that the presently described polymorphic 
markers could be useful in forensic analysis. The fact that forensic biologists use polymorphic markers such 
as those described by Appellants everyday provides more that ample support for the assertion that forensic 
biologists would also be able to use the specific polymorphic markers described by Appellants in the same 
fashion. Therefore, these allegations are completely without merit, and in no way establish that the present 
invention lacks utility. 

Additionally, the Examiner seems to be confusing the requirements of a specific utility with a 

unique utility. The fact that other polymorphic markers have been identified in other genetic loci, or that 

the use of the presently described polymorphic markers will provide additional information concerning the 

prevalence of these markers in certain subpopulations, does not mean that Appellants' identification of 

polymorphic markers in SEQ ID NO: 1 is not specific . As clearly stated by the Federal Circuit in Carl 

Zeiss Stiftung v. Renishaw PLC, 20 USPQ2d 1101 (Fed. Cir. 1991): 

An invention need not be the best or only way to accomplish a certain result, and it need 
only be useful to some extent and in certain applications: "[T]he fact that an invention has 
only limited utility and is only operable in certain applications is not grounds for finding a 
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lack of utility." Envirotech Corp. v. Al George, Inc., 221 USPQ 473, 480 (Fed. Cir. 
1984) 

In other words, just because other (possibly better) polymorphic markers from the human genome have 
been described, or that additional information about the presently described polymorphic markers can be 
gained through the use of these markers, does not establish that the presently described polymorphic 
markers lack a specific utility. The requirement for a specific utility, which is part of the standard for utility 
under35U.S.C. § 101 presently being applied by the Office, should not be confused with the requirement 
for a unique utility, which is not the legal standard. If every invention were required to have a unique utility, 
the Patent and Trademark Office would no longer be issuing patents on batteries, automobile tires, golf 
balls, golf clubs, and treatments for a variety of human diseases, just to name a few particular examples, 
because other examples of each of these have already been described and patented. However, only the 
briefest perusal of virtually any issue of the Official Gazette provides numerous examples of patents being 
granted on each of the above compositions every week . Furthermore, if each invention needed to have 
a unique utility in order to be patented, the entire class and subclass system would be an effort in futility, 
as the class and subclass system serves solely to group such common inventions, which would not be 
required if each invention needed to have a unique utility. In view of the above standards and "common 
sense" analysis, there can be little question that the present sequence clearly meets the requirements of 
35U.S.C. § 101. 

Furthermore, as the presently described polymorphisms are a part of the family of polymorphisms 

that have a well established utility, the Federal Circuit' s holding in In re Brana, (34 USPQ2d 1436 (Fed. 

Cir. 1995), "Brand") is directly on point. In Brana, the Federal Circuit admonished the Patent and 

Trademark Office for confusing "the requirements under the law for obtaining a patent with the requirements 

for obtaining government approval to market a particular drug for human consumption". Brana at 1442. 

The Federal Circuit went on to state: 

At issue in this case is an important question of the legal constraints on patent office 
examination practice and policy. The question is, with regard to pharmaceutical inventions, 
what must the applicant provide regarding the practical utility or usefulness of the invention 
for which patent protection is sought. This is not a new issue: it is one which we would 
have thought had been settled bv case law years ago . 
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Brana at 1439, emphasis added. The choice of the phrase "utility or usefulness" in the foregoing quotation 

is highly pertinent. The Federal Circuit is evidently using "utility" to refer to rejections under 

35 U.S.C. §101, and is using "usefulness" to refer to rejections under 35 U.S.C. § 1 1 2, first paragraph. 

This is made evident in the continuing text in Brana, which explains the correlation between 35 U.S.C. 

§§ 101 and 112, first paragraph. The Federal Circuit concluded: 

FDA approval, however, is not a prerequisite for finding a compound useful within the 
meaning of the patent laws. Usefulness in patent law, and in particular in the context of 
pharmaceutical inventions, necessarily includes the expectation of further research and 
development . The stage at which an invention in this field becomes useful is well before 
it is ready to be administered to humans. Were we to require Phase II testing in order to 
prove utility, the associated costs would prevent many companies from obtaining patent 
protection on promising new inventions, thereby eliminating an incentive to pursue, through 
research and development, potential cures in many crucial areas such as the treatment of 
cancer. 

Brana at 1442- 1443 , citations omitted, emphasis added. As set forth above, the present polymorphisms 
are useful in forensic analysis as described in the specification as originally filed, without the need for any 
further research. As discussed above, even if the use of these polymorphic markers provided additional 
information on the percentage of particular subpopulations that contain these polymorphic markers, this 
would not mean that "additional research" is needed in order for these markers as they are presently 
described in the instant specification to be of use to forensic science. As stated above, using the 
polymorphic marker as described in the specification as originally field can definitely distinguish members 
of a population from one another. However, even if, arguendo, further research might be required in 
certain aspects of the present invention, this does not preclude a finding that the invention has utility, as set 
forth by the Federal Circuit's holding in Brana, which clearly states, as highlighted in the quote above, that 
"pharmaceutical inventions, necessarily includes the expectation of further research and dev elopment" 
{Brana at 1442-1443, emphasis added). In assessing the question of whether undue experimentation 
would be required in order to practice the claimed invention, the key term is "undue", not 
"experimentation". In reAngstadt and Griffin, 190 USPQ 214 (CCPA 1976). The need for some 
experimentation does not render the claimed invention unpatentable. Indeed, a considerable amount of 
experimentation may be permissible if such experimentation is routinely practiced in the art. In re Angstadt 
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and Griffin, supra; Amgen, Inc. v. Chugai Pharmaceutical Co., Ltd., 18 USPQ2d 1016 (Fed. Cir. 
1 99 1 ). Again, as a matter of law, it is well settled that a patent need not disclose what is well known in the 
art (In re Wands, supra). 

Although Appellants need only make one credible assertion of utility to meet the requirements of 
35 U.S.C. § 101 (Raytheon v. Roper, 220 USPQ 592 (Fed. Cir. 1983); In re Gottlieb, 140 USPQ 665 
(CCPA 1964); In re Malachowski, 189 USPQ 432 (CCPA 1976); Hoffman v. Klaus, 9 USPQ2d 1657 
(Bd. Pat. App. & Inter, 1988)), Appellants noted in the response to the First Action, the response to the 
Second Action, and the response to the Final Action, as a further example of the utility of the presently 
claimed polynucleotide, as described in the specification at least at page 2, lines 24-26, the present 
nucleotide sequences have a specific utility in "identification of protein coding sequence" and "mapping a 
unique gene to a particular chromosome". This is evidenced by the fact that SEQ ID NO: 1 can be used 
to map the 5 coding exons of the gene comprising the presently claimed sequence on chromosome 4 
(present within a chromosome 4 clone; Genbank Accession Number AC 1048 19; alignment and the first 
page from the Genbank report are presented in Exhibit A). Appellants respectfully remind the Board that 
only a minor percentage (2-4%) of the genome actually encodes exons, which in-turn encode amino acid 
sequences. The presently claimed polynucleotide sequence provides biologically validated empirical data 
(e.g., showing which sequences are transcribed, spliced, and polyadenylated) that specifically define that 
portion of the corresponding genomic locus that actually encodes exon sequence. Equally significant is that 
the claimed polynucleotide sequence defines how the encoded exons are actually spliced together to 
produce an active transcript (i.e. , the described sequences are useful for functionally defining exon splice- 
junctions). Such biologically validated splice junctions are superior to splice junctions that may have been 
predicted from genomic sequence alone, and, as detailed in the specification, at least at page 10, lines 28- 
33, that "sequences derived from regions adjacent to the intron/exon boundaries of the human gene can 
be used to design primers for use in amplification assays to detect mutations within the exons, introns, splice 
sites (e.g., splice acceptor and/or donor sites), etc., that can be used in diagnostics and 
pharmacogenomics". Appellants respectfully submit that the practical scientific value of biologically 
validated , expressed, spliced, and polyadenylated mRNA sequences is readily apparent to those skilled 
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in the relevant biological and biochemical arts. 

Clearly, the present polynucleotide provides exquisite specificity in localizing the specific region of 
human chromosome 4 that contains the gene encoding the given polynucleotide, a utility not shared by 
virtually any other nucleic acid sequences. In fact, it is this specificity that makes this particular sequence 
so useful. Early gene mapping techniques relied on methods such as Giemsa staining to identify regions of 
chromosomes. However, such techniques produced genetic maps with a resolution of only 5 to 10 
megabases, far too low to be of much help in identifying specific genes involved in disease. The skilled 
artisan readily appreciates the significant benefit afforded by markers that map a specific locus of the human 
genome, such as the present nucleic acid sequence. For further evidence in support of the Appellants' 
position, the Board is requested to review, for example, section 3 of Venter et al. (2001, Science 
297:1304, at pp. 1317-1321, includingFig. 11 at pp. 1324- 1325; Exhibit B), which demonstrates the 
significance of expressed sequence information in the structural analysis of genomic data. The presently 
claimed polynucleotide sequence defines a biologically validated sequence that provides a unique and 
specific resource for mapping the genome essentially as described in the Venter et al. article. Thus, the 
present claims clearly meet the requirements of 35 U.S.C. § 101. 

The Examiner also questions these asserted utilities in the Advisory Action. The Examiner states 
that "while Applicants assert that the claimed polynucleotide encodes 5 exons, no empirical determination 
has been made to corroborate that the claimed polynucleotide contains 5 exons" (The Advisory Action 
bridging pages 5 and 6). Appellants are completely at a loss to understand what corroboration is required 
to confirm that, as asserted by Appellants, and shown above in Exhibit A, that the presently claimed 
sequence contains 5 exons. In Exhibit A, Appellants have conducted the exact analysis used by those of 
skill in the art to determine the number of exons present in a cDNA sequence - specifically, by comparing 
the cDNA sequence (in this case, SEQ ID NO: 1) to human genomic sequence, the exons present in the 
cDN A clone are indicated by stretches of homologous sequences in one or more genomic clones that are 
separated by introns (which, by definition, are missing from the cDNA clone). The Advisory Action further 
states that "it is unclear to the Examiner as to how the information provided by Applicants is validated 
empirical data or if one can use the claimed polynucleotide to map coding exons " (the Advisory Action at 
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page 6, emphasis added). By providing the information shown in Exhibit A, there can be no doubt that 
the presently claimed sequence can be used to "map coding exons". Thus, the Examiner's position does 
not support the alleged lack of utility. 

With regard to the utility of mapping the protein coding regions of chromosome 4, the Advisory 
Action states that this utility is not specific because "any polynucleotide in human chromosome 4 can be 
used to identify that chromosome" (the Advisory Action at page 5). This argument fails to support the 
Examiner' s allegation of a lack of utility in at least two respects. First, Appellants respectfully point out that 
while non-coding nucleotide sequences from this precise region of chromosome 4 could be used to map 
the introns and exons as described above, it would only be possible using the information provided 
by Appellants in the specification as originally filed - specifically, one needs to know which sequences 
correspond to the coding regions in order to use non-coding sequences to map intron/exon junctions. This 
is a classic case of hindsight reconstruction, using the information provided for the first time in Appellants 
own application against them in an attempt to question Appellants assertion of utility, and does not serve 
as a proper foundation for such an allegation. Second, and most importantly, the Examiner again seems 
to be confusing the requirements of a specific utility with a unique utility. The fact that a small number of 
other nucleotide sequences could be used to map the protein coding regions in this specific region of 
chromosome 4 does not mean that the use of Appellants' sequence to map the protein coding regions of 
chromosome 4 is not specific (Carl Zeiss Stiftung v. Renishaw PLC, supra). 

Additionally, Appellants noted in the response to the First Action, the response to the Second 
Action, and the response to the Final Action, that a sequence sharing 99% percent homology over an 
extended region with the described sequence is present in the leading scientific repository for biological 
sequence data (GENB ANK), and has been annotated by third party scientists wholly unaffiliated with 
Appellants a sequence "similar to epidermis specific serine protease" from humans (GenBank accession 
number XM_093852; alignment shown in Exhibit C). Furthermore, there is another sequence that shares 
over 99% percent homology over an extended region with the described sequence, which is present in the 
GenBank patent database, and has been annotated by third party scientists wholly unaffiliated with 
Appellants as a "protease" (GenBank patent database accession number AX360076; alignment and patent 
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information provided in Exhibit D). The legal test for utility simply involves an assessment of whether those 
skilled in the art would find any of the utilities described for the invention to be credible or believable. 
Given these GenBank citations, there can be no doubt that those skilled in the art would clearly believe that 
Appellants' sequence is a serine protease. 

The Examiner has repeatedly questioned Appellants' assertion that the presently claimed sequence 
encodes a serine protease. In the First Action, the Examiner cited Bork (Genome Research 70:398-400, 
2000) as supporting the proposition that those skilled in the art would not believe that a protein has a given 
biochemical activity if it displays less than 70% homology with related proteins annotated as having that 
activity. First, even if, arguendo, one accepts the Examiner' s contention that greater than 70% homology 
is required for one skilled in the art to believe that a sequence encodes a certain activity, the presently 
claimed sequence has exceeded the Examiner's, albeit arbitrary, threshold of sequence relatedness. 
Second, and more importantly, the 70% figure cited from the Bork article relates to the 70% accuracy of 
the resulting prediction , not 70% homology. In fact, nowhere in Bork is there a comparison of the 
prediction accuracy based on the percentage homology between two proteins or two classes of proteins, 
and thus does not support the alleged lack of utility for the present invention. 

The First Action next cited Smith and Zhang (Nature Biotechnology 15: 1222-1223, 1997) as 
teaching "that there are numerous cases in which proteins of very different functions are homologous" (the 
First Action at page 4). However, the Smith and Zhang article also states "the major problems associated 
with nearly all of the current automated annotation approaches are - paradoxically - minor database 
annotation inconsistencies (and a few outright errors)" (page 1222, second column, first paragraph, 
emphasis added). Thus, Smith and Zhang do not in fact seem to stand for the proposition that prediction 
of function based on homology is fraught with uncertainty, and thus also does not support the alleged lack 
of utility. 

The First Action next cited Brenner (TIG 15: 1 32-133, 1999) as teaching that "most homologs must 
have different molecular and cellular functions" (the First Action at page 4). However, this statement is 
based on the assumption that "if there are only 1000 superfamilies in nature, then most homologs must have 
different molecular and cellular functions' ' (page 1 32, second column). Furthermore, Brenner suggests that 
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one of the main problems in using homology to predict function is "an issue solvable by appropriate use of 
modern and accurate sequence comparison procedures" (page 132, second column), and in fact references 
an article by Altschul et al , which is the basis for one of the "modern and accurate sequence comparison 
procedures" used by Appellants. Thus, the Brenner article also does not support the alleged lack of utility. 

Finally, the First Action cited Broun et al (Science 282: 1 3 1 5- 1 3 1 7, 1998) and Van de Loo et al 
(Proc. Natl. Acad. Sci. USA 92:6743-6747, 1995) as teaching that prediction of function based on 
homology is unpredictable. The Final Action and the Advisory Action reiterate these citations. However, 
these papers cite only one example, microsomal oleate desaturase/oleate 1 2-hydroxylase, where function 
based on sequence homology proved to be incorrect. One example out of the thousands of predictions 
of function based on homology that exist in the art is hardly indicative of a high level of uncertainty, and thus 
also does not support the alleged lack of utility. 

The Advisory Action, continuing on this theme, now cites articles by Seffernick etal (J. Bacteriol. 
183:2405-2410, 2001) and Witkowski etal (Biochemistry 38:1 1643-1 1650, 1999) to again attempt to 
support the proposition that prediction of protein function from homology information is somewhat 
unpredictable. However, while Appellants have provided evidence of record that conclusively establishes 
that those skilled in the art would believe that the specifically claimed sequence encodes a serine protease, 
the Examiner has provided no evidence that directly establishes that the specifically claimed sequence does 
not encode a serine protease. Accordingly, the evidence of record compels a finding that the present 
invention has a patentable utility. 

Furthermore, with regard to the citation of journal articles to support an allegation of a lack of utility, 
the PTO has repeatedly attempted to deny the utility of nucleic acid sequences based on a small number 
of publications that call into doubt prediction of protein function from homology information and the 
usefulness of bioinformatic predictions, of which these articles are merely the latest examples. Appellants 
readily agree that there is not 100% consensus within the scientific community regarding prediction of 
protein function from homology information, and further agree that prediction of protein function from 
homology information is not 100% accurate. However, Appellants respectfully point out that the lack of 
100% consensus on prediction of protein function from homology information is completely irrelevant 
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to the question of whether the claimed nucleic acid sequence has a substantial and specific utility, and that 
100% accuracy of prediction of protein function from homology information is not the standard for 
patentability under 35 U.S.C. § 101. Appellants respectfully point out that, as discussed above, the legal 
test for utility simply involves an assessment of whether those skilled in the art would find any of the utilities 
described for the invention to be believable . Appellants submit that the overwhelming majority of those 
of skill in the relevant art would believe prediction of protein function from homology information and the 
usefulness of bioinformatic predictions to be powerful and useful tools, as evidenced by hundreds if not 
thousands of journal articles (which Appellants will submit to the Office if the Board truly doubts 
Appellants' assertion that the overwhelming majority of those of skill in the art place a high value on 
prediction of protein function from homology information and the usefulness of bioinformatic predictions), 
and would thus believe that Appellants sequence is a serine protease. As believabilitv is the standard 
for meeting the utility requirement of 35 U.S.C. § 101, and not 100% consensus or 100% accuracy, 
Appellants submit that the present claims must clearly meet the requirements of 35 U.S.C. § 101. 

Thus, those of skill in the art would readily appreciate the importance of tracking the expression 
of the gene encoding the described protein, for example using high-throughput DNA chips, as the 
specification details on page 5, lines 19-21 . Such "DNA chips" clearly have utility, as evidenced by 
hundreds of issued U.S. Patents, as exemplified by U.S. Patent Nos. 5,445,934 (Exhibit E), 5,556,752 
(Exhibit F), 5,744,305 (Exhibit G), 5,837,832 (Exhibit H), 6,156,501 (Exhibit I) and 6,261,776 
(Exhibit J). Evidence of the "real world" substantial utility of the present invention is further provided by 
the fact that there is an entire industry established based on the use of gene sequences or fragments thereof 
in a gene chip format. Perhaps the most notable gene chip company is Affymetrix. However, there are 
many companies that have, at one time or another, concentrated on the use of gene sequences or fragments, 
in gene chip and non-gene chip formats, for example: Gene Logic, ABI-Perkin-Elmer, HySeq and Incyte. 
In addition, one such company (Rosetta Inpharmatics) was viewed to have such "real world" value that it 
was acquired by large a pharmaceutical company (Merck) for significant sums of money (net equity value 
of the transaction was $620 million). The "real world" substantial industrial utility of gene sequences or 
fragments would, therefore, appear to be widespread and well established. Clearly, there can be no doubt 
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that the skilled artisan would know how to use the presently claimed sequences (see Section Vffl(B), 
below), strongly arguing that the claimed sequences have utility. Given the widespread utility of such "gene 
chip" methods using public domain gene sequence information, there can be little doubt that the use of the 
presently described novel sequences would have great utility in such DNA chip applications. As the 
present sequences are specific markers of the human genome (see above), and such specific markers are 
targets for the discovery of drugs that are associated with human disease, those of skill in the art would 
instantly recognize that the present nucleotide sequences would be ideal, novel candidates for assessing 
gene expression using such DNA chips. Clearly, compositions that enhance the utility of such DNA chips, 
such as the presently claimed nucleotide sequences, must in themselves be useful. Thus, the present claims 
clearly meet the requirements of 35 U.S.C. § 101. 

The Advisory Action also questions this utility, stating "the specification is silent in regard to its 
substrate or its biological function" (the Advisory Action at page 5). The Advisory Action goes on to cite 
articles by Walker et al (Cellular and Molecular life Sciences 5 8 : 596-624, 200 1 ) and Caughey (Am. R. 
Respir. Crit. Care. Med. 150:5138-5142, 1994) to show that proteases have different substrates and 
different specific biological roles, and concludes that therefore "further research" (the Advisory Action at 
page 5) would be required in order for the skilled artisan to use the presently claimed sequence. However, 
this argument is thwarted by the fact that expression profiling (as well as the utilities discussed above) does 
not require a knowledge of the function of the particular nucleic acid on the chip - rather the gene chip 
indicates which DNA fragments are expressed at greater or lesser levels in two or more particular tissue 
types. Skilled artisans already have used and continue to use sequences such as Appellants in gene chip 
applications without further experimentation. Appellants respectfully point out that this is exactly how most 
gene chip applications are carried out. Furthermore, the fact that additional information concerning the 
presently claimed sequence might make it even more useful in certain gene chip embodiments does not 
mean that the use of Appellants' sequence to track gene expression on a gene chip is not specific {Carl 
Zeiss Stiftung v. Renishaw PLC, supra). Therefore, this argument also fails to support the alleged lack 
of utility of the presently claimed compositions. 

Clearly, persons of skill in the art, as well as venture capitalists and investors, readily recognize the 
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utility, both scientific and commercial, of genomic data in general, and specifically human genomic data. 

Billions of dollars have been invested in the human genome project, resulting in useful genomic data (see, 

e.g., Venter et al. , supra ; Exhibit B) . The results have been a stunning success as the utility of human 

genomic data has been widely recognized as a great gift to humanity (see, e.g., Jasny and Kennedy, 200 1 , 

Science 291 : 1 153; Exhibit K). Clearly, the usefulness of human genomic data, such as the presently 

claimed nucleic acid molecules, is substantial and credible (worthy of billions of dollars and the creation of 

numerous companies focused on such information) and well-established (the utility of human genomic 

information has been clearly understood for many years). 

Importantly, it has been clearly established that a statement of utility in a specification must be 

accepted absent reasons why one skilled in the art would have reason to doubt the objective truth of such 

statement. In re hanger, 503 F.2d 1380, 1391, 183 USPQ 288, 297 (CCPA, 1974; "hanger"); In re 

Marzocchi, 439 F.2d 220, 224, 169 USPQ 367, 370 (CCPA, 1971). As clearly set forth in hanger. 

As a matter of Patent Office practice, a specification which contains a disclosure of utility 
which corresponds in scope to the subject matter sought to be patented must be taken as 
sufficient to satisfy the utility requirement of § 101 for the entire claimed subject matter 
unless there is a reason for one skilled in the art to question the objective truth of the 
statement of utility or its scope. 

hanger at 297, emphasis in original. As set forth in the MPEP, "Office personnel must provide evidence 
sufficient to show that the statement of asserted utility would be considered 'false' by a person of ordinary 
skill in the art" (MPEP, Eighth Edition at 2 100-40, emphasis added). Thus, the present claims clearly meet 
the requirements of 35 U.S.C. § 101. 

Furthermore, regarding the utility requirements under 35 U.S.C. § 101 , the Federal Circuit has 
clearly stated "(t)he threshold of utility is not high: An invention is 'useful' under section 101 if it is capable 
of providing some identifiable benefit." Juicy Whip Inc. v. Orange Bang Inc., 185 F.3d 1364, 51 
USPQ2d 1700 (Fed. Cir. 1999) (citing Brenner v. Manson, 383 U.S. 519, 534 (1966)). Additionally, 
the Federal Circuit has stated that "(t)o violate § 101 the claimed device must be totally incapable of 
achieving a useful result." Brooktree Corp. v. Advanced Micro Devices, Inc., 977 F.2d 1555, 1571, 
24 USPQ2d 1401 (Fed. Cir. 1992), emphasis added. Cross v. lizuka (753 F.2d 1040, 224 USPQ 739 
(Fed. Cir. 1985); "Cross") states "any utility of the claimed compounds is sufficient to satisfy 35 U.S.C. 
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§ 101". Cross at 748, emphasis added. Indeed, the Federal Circuit recently emphatically confirmed that 
"anything under the sun that is made by man" is patentable ( State Street Bank & Trust Co. v. Signature 
Financial Group Inc., 149 F.3d 1368, 47 USPQ2d 1596, 1600 (Fed. Cir. 1998), citing the U.S. 
Supreme Court's decision in Diamondvs. Chakrabarty, 447 U.S. 303, 206 USPQ 193 (U.S., 1980)). 
Thus, based on the relevant case law, the present claims clearly meet the requirements of 35 U.S.C. § 101 . 

Finally, While Appellants are well aware of the new Utility Guidelines set forth by the USPTO, 
Appellants respectfully point out that the current rules and regulations regarding the examination of patent 
applications is and always has been the patent laws as set forth in 35 U.S.C. and the patent rules as set 
forth in 37 C.F.R., not the Manual of Patent Examination Procedure or particular guidelines for patent 
examination set forth by the USPTO. Furthermore, it is the job of the judiciary, not the USPTO, to 
interpret these laws and rules. Appellants are unaware of any significant recent changes in either 
35 U.S .C. § 10 1 , or in the interpretation of 35 U.S.C. § 101 by the Supreme Court or the Federal Circuit 
that is in keeping with the new Utility Guidelines set forth by the USPTO. This is underscored by numerous 
patents that have been issued over the years that claim nucleic acid fragments that do not comply with the 
new Utility Guidelines. As examples of such issued U.S. Patents, the Board is invited to review U.S. Patent 
Nos. 5,817,479 (Exhibit L), 5,654,173 (Exhibit M), and 5,552,281 (Exhibit N; each of which claims 
short polynucleotides), and recently issued U.S. Patent No. 6,340,583 (Exhibit O; which includes no 
working examples), none of which contain examples of the "real-world" utilities that the Examiner seems 
to be requiring. As issued U.S . Patents are presumed to meet all of the requirements for patentability, 
including 35 U.S.C. §§101 and 1 12, first paragraph (see Section VDI(B), below), Appellants submit that 
the present polynucleotides must also meet the requirements of 35 U.S.C. § 101. While Appellants 
understand that each application is examined on its own merits, Appellants are unaware of any changes to 
35U.S.C. § 101,orin the interpretation of 35U.S.C. § 101 by the Supreme Court or the Federal Circuit, 
since the issuance of these patents that render the subject matter claimed in these patents, which is similar 
to the subject matter in question in the present application, as suddenly non-statutory or failing to meet the 
requirements of 35 U.S.C. § 101. Thus, holding Appellants to a different standard of utility would be 
arbitrary and capricious, and, like other clear violations of due process, cannot stand. 
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For each of the foregoing reasons, Appellants submit that the rejection of claims 1-8 under 
35 U.S.C. § 101 must be overruled. 

B. Are Claims 1-8 Unusable Due to a Lack of Patentable Utility? 

The Final Action next rejects claims 1-8 under35 U.S.C. § 112, first paragraph, since allegedly 
one skilled in the art would not know how to use the invention, as the invention allegedly is not supported 
by either a clear asserted utility or a well-established utility. 

The arguments detailed above in Section VDI(A) concerning the utility of the presently claimed 
sequences are incorporated herein by reference. As the Federal Circuit and its predecessor have 
determined that the utility requirement of Section 101 and the how to use requirement of Section 1 12, first 
paragraph, have the same basis, specifically the disclosure of a credible utility (In re Brana, supra; In re 
Jolles, 628 F.2d 1322, 1326 n.ll, 206 USPQ 885, 889 n.ll (CCPA 1980); In re Fouche, 439 F.2d 
1237, 1243, 169 USPQ 429, 434 (CCPA 1971)), Appellants submit that as claims 1-8 have been shown 
to have "a specific, substantial, and credible utility", as detailed in Section V1H(A) above, the present 
rejection of claims 1-8 under 35 U.S.C. § 112, first paragraph, cannot stand. 

Appellants therefore submit that the rejection of claims 1-8 under 35 U.S.C. § 1 12, first paragraph, 
must be overruled. 

C. Is Claim 2 Indefinite? 

The Final Action next rejectedclaim 2 under 35 U.S.C. § 1 12, second paragraph, as allegedly 
being indefinite for failing to particularly point out and distinctly claim the invention. 

The Final Action rejects claim 2 as allegedly indefinite based on the term "sequence" in relation to 
nucleic acid hybridization, since "(h)ybridization occurs only between molecules" (the Final Action at 
page 6), and the term "the complement thereof, since this term is allegedly indefinite "for reasons of 
record" (the Final Action at page 6). First, Appellants stress that "a claim need not 'describe* the invention, 
such description being the role of the disclosure". Orthokinetics, Inc. v. Safety Travel Chairs, Inc., 
lUSPQ2dl081, 1088 (Fed. Cir. 1986). Appellants respectfully point out that the skilled artisan would 
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clearly understand how the nucleotide sequence could hybridize, within the parameters set forth in claim 2, 
and would also understand that the skilled artisan would understand the term "the complement" (as clearly 
opposed to "a" complement) to refer to the complete complement of SEQ ID NO: 1. The claims are 
therefore sufficiently definite when read in light of the specification, which reasonably apprises those skilled 
in the art both of the utilization and scope of the invention. Shatterproof Glass Corp, v. Libbey Owens 
Ford Co., 225 USPQ 634, 641 (Fed. Cir. 1985); Miles Laboratories, Inc. v. Shandon, 997 F.2d 870, 
875, 27 USPQ2d 1 123, 1 126 (Fed. Cir. 1993); Union Pacific Resources Co. v. Chesapeake Energy 
Corp., 236 F.3d 684, 692, 57 USPQ2d 1293, 1297 (Fed. Cir. 2001); North American Vaccine, Inc. 
v. American Cyanamid Co., F.3d 1571, 1579, 28 USPQ2d 1333, 1339 (Fed. Cir. 1993); Hybritech, 
Inc. v. Monoclonal Antibodies, 802 F.2d 1367, 1385, 231 USPQ 81, 94-95 (Fed. Cir. 1986). 

More importantly, however, Appellants submit that the United States Patent and Trademark Office 
itself finds this exact language to meet the requirements of 35 U.S.C. § 1 12, second paragraph, as 
evidenced at least by issued U.S. Patent Nos. 6,53 1 ,309 (Exhibit P), 6,5 1 1 ,840 (Exhibit Q), 6,476,2 10 
(Exhibit R), 6,465,632 (Exhibit S), 6,462,186 (Exhibit T), 6,448,388 (Exhibit U), 6,444,456 
(Exhibit V), 6,444, 153 (Exhibit W) and 6,403,784 (Exhibit X), each of which each of which is assigned 
to the same entity as the present application and contains the exact same language that the Examiner finds 
indefinite in the present case. As issued U.S. Patents are presumed to meet all of the requirements for 
patentability, including 35 U.S.C. § 1 12, second paragraph, Appellants submit that claim 2 must also meet 
the requirements of 35U.S.C. § 112, second paragraph. Holding Appellants to a different standard of 
defmiteness would be arbitrary and capricious, and, like other clear violations of due process, cannot stand. 

For each of the foregoing reasons, Appellants submit that the rejection of claim 2 under 
35 U.S.C. § 112, second paragraph, must be overruled. 

D. Do Claims 1, 5 and 8 Lack Sufficient Written Description? 

The Final Action next rejected claims 1, 5 and 8 under 35 U.S.C. § 1 12, first paragraph, as 
allegedly containing subject matter that was not described in the specification in such a way as to 
reasonably convey to one skilled in the relevant art that the inventors, at the time the application was filed, 
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had possession of the claimed invention. 

The Examiner seems to be requiring that the function of each of the members of the genus be 
known in order to satisfy the written description requirement. However, the Examiner' s stated position 
completely misreads the written description requirement. The repeated citations of Broun etal {supra), 
Van de Loo et al {supra), Seffernick et al {supra), and Witkowski et al {supra), each of which were 
dealt with in Section VIH(A), above, for the proposition that changes in protein sequence can lead to 
changes in protein function, are completely irrelevant to the present question of compliance with 
35 U.S.C. § 1 12, first paragraph. As set forth in the response to the Second Action and the response to 
the Final Action, the relevant section of the written description guidelines is herein reproduced with numbers 
corresponding to the ways in which the written description requirement can be satisfied: (1) by actual 
reduction to practice, (2) reduction to drawings, or (3) by disclosure of relevant, identifying characteristics, 
i.e., (4) structure or other physical and/or chemical properties, (5) by functional characteristics coupled with 
a known or disclosed correlation between function and structure, or (6) by a combination of such identifying 
characteristics. Thus, the written description requirements can be satisfied by (1), (2), or (3), and part (3) 
can be satisfied by (4), (5), or (6). Appellants submit that claim 1 provides "structure or other physical or 
chemical properties", specifically, the nucleotide sequence itself . There is no requirement within 
section (4) for functional characteristics, this being included in sections (5) and (6) only . Thus, since claims 
1 , 5 and 8 satisfy section (3) by satisfying section (4), claims 1 , 5 and 8 must meet the written description 
requirement. 

35 U.S.C. § 1 12, first paragraph, requires that the specification contain a written description of the 

invention. The Federal Circuit in Vas-Cathlnc. v. Mahurkar {19 XJS?Q2d 1111 (Fed.Cir. 1991); "Vas- 

Cath") held that an "applicant must convey with reasonable clarity to those skilled in the art that, as of the 

filing date sought, he or she was in possession of the invention" Vas-Cath, at 1 1 17, emphasis in original. 

However, it is important to note that the above finding uses the terms reasonable clarity to those skilled in 

the art . Further, the Federal Circuitin/n re Gosteli (10USPQ2d 1614 (Fed. Cir. 1989); "Gosteli") held: 

Although [the applicant] does not have to describe exactly the subject matter claimed , . . 
. the description must clearly allow persons of ordinary skill in the art to recognize that [he 
or she] invented what is claimed. 
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Gosteli at 1618, emphasis added. Additionally, Utter v. Hiraga (6 USPQ2d 1709 (Fed. Cir. 1988); 
"Utter"), held "(a) specification may, within the meaning of 35 U.S.C. § 112^1, contain a written 
description of a broadly claimed invention without describing all species that claim encompasses" {Utter, 
at 1714). Therefore, all Appellants must do to comply with 35 U.S.C. § 11 2, first paragraph, is to convey 
the invention with reasonable clarity to the skilled artisan . 

Further, the Federal Circuit has held that an adequate description of a chemical genus "requires a 
precise definition, such as by structure, formula, chemical name or physical properties" sufficient to 
distinguish the genus from other materials. Fiers v. Revel 25 USPQ2d 1601, 1606 (Fed. Cir. 1993; 
"Fiers"). Fiers goes on to hold that the "application satisfies the written description requirement since it 
sets forth the . . . nucleotide sequence" (Fiers at 1607). In other words, provision of a structure and 
formula - the nucleotide sequence - renders the application in compliance with 35 U.S.C. § 1 12, first 
paragraph. 

More recently, the standard for complying with the written description requirement in claims 

involving chemical materials has been explicitly set forth by the Federal Circuit: 

In claims involving chemical materials, generic formulae usually indicate with specificity 
what the generic claims encompass. One skilled in the art can distinguish such a formula 
from others and can identify many of the species that the claims encompass. Accordingly, 
such a formula is normally an adequate description of the claimed genus. Regents of 
Univ. of California v. Eli Lilly and Co., 43 USPQ2d 1398, 1406 (Fed. Cir. 1997). 

Thus, a claim describing a genus of nucleic acids by structure, formula, chemical name or physical 

properties sufficient to allow one of ordinary skill in the art to distinguish the genus from other materials 

meets the written description requirement of 35 U.S.C. § 11 2, first paragraph. As further elaborated by 

the Federal Circuit in Regents of Univ. of California v. Eli Lilly and Co. : 

In claims to genetic material ... a generic statement such as 'vertebrate insulin cDNA' or 
'mammalian insulin cDNA' , without more, is not an adequate written description of the 
genus because it does not distinguish the claimed genus from others, except by function. 
It does not specifically define any of the genes that fall within its definition. It does not 
define any structural features commonly possessed bv members of the genus that 
distinguish them from others. One skilled in the art cannot, as one can do with a fully 
described genus, visualize or recognize the identity of members of the genus. (Emphasis 
added) 
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Thus, as opposed to the situation set forth in Regents of Univ. of California v. Eli Lilly and Co. and 
Fiers, the nucleic acid sequences of the present invention are not distinguished on the basis of function, or 
a method of isolation, but in fact are distinguished by structural features - a chemical formula , Le. , the 
sequence itself 

Using the nucleic acid sequences of the present invention (as set forth in the Sequence Listing), the 
skilled artisan would readily be able to distinguish the claimed nucleic acids from other materials on the 
basis of the specific structural description provided. Polynucleotides comprising at least 24 contiguous 
bases from SEQ ID NO: 1 are within the genus of the instant claims, while those that lack this structural 
feature lie outside the genus. Importantly, the Final Action admitted that claims 1 , 5 and 8 do in fact 
include a distinguishing feature, specifically, that the nucleic acid molecule must include "at least 24 
consecutive nucleotides of the polynucleotide of SEQ ID NO: 1 " (the Final Action at page 7). Additionally, 
the Advisory Action the Examiner agrees "that the claimed genus of polynucleotides is defined in structural 
terms" (the Advisory Action at page 7). Appellants respectfully point out that this is all that is required 
of claims 1, 5 and 8 to meet the written description requirement of 35 U.S.C. § 112, first paragraph 

For each of the foregoing reasons, Appellants submit that the rejection of claims 1 , 5 and 8 under 
35 U.S.C. § 112, first paragraph, must be overruled. 

E, Are Claims 1, 5 and 8 Enabled? 

The Final Action next rejected claims 1, 5 and 8 under 35 U.S.C. § 112, first paragraph, as 
allegedly not described in the specification in such a way as to enable one skilled in the art to make and/or 
use the invention. 

The Final Action stated that "this enablement rejection was applied due to the lack of information 
as to how one of skill in the art can reasonably make and use the polynucleotides, as encompassed by the 
claims" (the Final Action at page 9; emphasis in original). The Final Action once again contends that the 
specification provides insufficient guidance regarding the biological function or activity of certain of the 
claimed compositions. As set forth in Section VIH(D), above, the repeated citations of Broun et al 
{supra) and Van de Loo et al {supra), Seffernick et al {supra), and Witkowski et al {supra), for the 
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proposition that changes in protein sequence can lead to changes in protein function, are once again 
completely irrelevant to the present question of compliance with 35 U.S.C. § 1 12, first paragraph. 
Importantly, such an enablement standard conflicts with established patent law. 

Appellants point out that significant commercial exploitation of nucleic acid sequences requires no 
more information than the nucleic acid sequence itself . Applications ranging from gene expression analysis 
or profiling (utilizing, for example, arrays of short, overlapping or non-overlapping, oligonucleotides and 
DNA chips, as described in Section VEI(A), above) to chromosomal mapping (utilizing, for example, short 
oligonucleotide probes or full length DNA sequences, as described in Section VIII(A), above) are 
practiced utilizing nucleic acid sequences and techniques that are well-known to those of skill in the art. 
The widespread commercial exploitation of nucleic acid sequence information points to the level of skill in 
the art, and the enablement provided by disclosures such as the present specification, which include specific 
nucleic acid sequences and guidance regarding the various uses of such sequences. Thus, the skilled artisan 
can clearly make and use the claimed polynucleotides, which is all that is required to meet the enablement 
requirement under 35 U.S.C. § 112, first paragraph. 

The Examiner states that the present invention could not be practiced without "undue 

experimentation" (the Final Action bridging pages lOand 11). However, it is important to remember that 

in assessing the question of whether undue experimentation would be required in order to practice the 

claimed invention, the key term is "undue", not "experimentation". In reAngstadt and Griffin, supra. In 

In re Wands (supra; "Wands"), the P.T.O. took the position that the applicant failed to demonstrate that 

the disclosed biological processes of immunization and antibody selection could reproducibly result in a 

useful biological product (antibodies from hybridomas) within the scope of the claims. In its decision 

overturning the P.T.O.'s rejection, the Federal Circuit found that Wands' demonstration of success in four 

out of nine cell lines screened was sufficient to support a conclusion of enablement. The court emphasized 

that the need for some experimentation requiring, e.g. , production of the biological material followed by 

routine screening, was not a basis for a finding of non-enablement, stating: 

Disclosure in application for the immunoassay method patent does not fail to meet 
enablement requirement of 35 USC 112 by requiring 'undue experimentation,' even though 
production of monoclonal antibodies necessary to practice invention first requires 
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production and screening of numerous antibody producing cells or 'hybridomas,' since 
practitioners of art are prepared to screen negative hybridomas in order to find those that 
produce desired antibodies, since in monoclonal antibody art one 'experiment* is not simply 
screening of one hybridoma but rather is entire attempt to make desired antibody, and 
since record indicates that amount of effort needed to obtain desired antibodies is not 
excessive, in view of Applicants' success in each attempt to produce antibody that satisfied 
all claim limitations. 

Wands at 1400. Thus, the need for some experimentation does not render the claimed invention 

unpatentable under 35 U.S.C. § 1 12, first paragraph. Indeed, a considerable amount of experimentation 

may be permissible if such experimentation is routinely practiced in the art. In reAngstadt and Griffin, 

supra; Amgen, Inc. v. Chugai Pharmaceutical Co., Ltd., supra. 

The Final Action questioned the teaching and guidance in the specification for certain aspects of 

the present invention. However, as discussed above, this requirement is completely misplaced. There is 

sufficient knowledge and technical skill in the art for a skilled artisan to be able to make and use the claimed 

DN A species in a number of different aspects of the invention entirely without further details in a patent 

specification. For example, it is not unreasonable to expect a Ph.D. level molecular biologist to be able to 

use the disclosed sequence to design oligonucleotide probes and primers and use them in, for example, 

PCR based screening and detection methods to obtain the described sequences and/or determine tissue 

expression patterns. Nevertheless, the present specification provides highly detailed descriptions of 

techniques that can be used to accomplish many different aspects of the claimed invention, including 

recombinant expression, site-specific mutagenesis, in situ hybridization, and large scale nucleic acid 

screening techniques, and properly incorporates by reference a montage of standard texts into the 

specification, such as Sambrook et ah {Molecular Cloning, A Laboratory Manual) and Ausubel etal. 

(Current Protocols in Molecular Biology) to provide even further guidance to the skilled artisan. 

Incorporation of material into the specification by reference is proper. Ex parte Schwarze, 151 USPQ 

426 (PTO Bd. App. 1966). The § 112, first paragraph rejection is thus prima facie improper: 

As a matter of patent office practice, then, a specification disclosure which contains a 
teaching of the manner and process of making and using the invention in terms which 
correspond in scope to those used in describing and defining the subject matter sought to 
be patented must be taken as in compliance with the enabling requirement of the first 
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paragraph of § 1 12 unless there is reason to doubt the objective truth of the statements 
contained therein which must be relied on for enabling support. 

In re Marzocchi, supra, emphasis as in original. In any event, an alleged lack of express teaching is 
insufficient to support a first paragraph rejection where one of skill in the art would know how to perform 
techniques required to perform at least one aspect of the invention. As a matter of law, it is well settled that 
a patent need not disclose what is well known in the art. In re Wands, supra. In fact, it is preferable that 
what is well known in the art be omitted from the disclosure. Hybritech, Inc. v. Monoclonal Antibodies, 
Inc. , supra. As standard molecular biological techniques are routine in the art, such protocols do not need 
to described in detail in the specification. 

As discussed In re Brana {supra, "Brana"), the Federal Circuit admonished the P.T.O. for 
confusing "the requirements under the law for obtaining a patent with the requirements for obtaining 
government approval to market a particular drug for human consumption". BranadX 1442. Furthermore, 
a specification "need describe the invention only in such detail as to enable a person skilled in the most 
relevant art to make and use it." In re Naquin, 158 USPQ 317, 319 (CCPA 1968); emphasis added. 
The present claims are thus enabled as they are supported by a specification that provides sufficient 
description to enable the skilled person to make and use the invention as claimed. Appellants stress that 
enablement must be analyzed, not in a vacuum, but "as it would be interpreted by one possessing the 
ordinary level of skill in the pertinent art." In re Moore, 169 USPQ 236, 238 (CCPA 1971). 

It has long been established that claims are enabled by defining any practical use. In re Nelson, 
126 USPQ 242 (CCPA 1960); Cross v. Iizuka, supra. "The enablement requirement is met if the 
description enables any mode of making and using the invention. " Johns Hopkins Univ. v. CellPro, Inc. , 
47 USPQ2d 1705, 1719 (Fed. Cir. 1998), citing Engel Indus., Inc. v. Lockformer Co., 20 USPQ2d 
1300, 1304 (Fed. Cir. 1991). As described in detail above, the specification details numerous applications 
in which claimed nucleotide sequences can be used, for example, to track gene expression using gene chips. 
Further, since public domain nucleotide sequences that have not been associated with any particular 
biological function, let alone validated as coding sequences, are used everyday in gene chip applications, 
it defies logic that undue experimentation would be required to use the presently described nucleotide 
sequences, which have been biologically validated as coding sequences, in the very same gene chip 
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applications. 

Appellants therefore submit that the rejection of claims 1,5 and 8 under 35 U.S. C. § 112, first 
paragraph, must be overruled. 



-26- 



IX. APPENDIX 

The claims involved in this appeal are as follows: 

1 . (Amended) An isolated nucleic acid molecule comprising at least 24 contiguous bases of 
nucleotide sequence from SEQ ED NO:l. 

2. (Amended) An isolated nucleic acid molecule comprising a nucleotide sequence that: 

(a) encodes the amino acid sequence shown in SEQ ID NO:2; and 

(b) hybridizes under highly stringent conditions to the nucleotide sequence of SEQ ID 
NO:l or the complement thereof. 

3. (Amended) An isolated nucleic acid molecule according to Claim 1 wherein said nucleotide 
sequence is a cDNA sequence. 

4. An isolated nucleic acid molecule according to Claim 3 encoding the amino acid sequence 
described in SEQ ID NO:2. 

5. A recombinant expression vector comprising the isolated nucleic acid molecule of claim 1. 

6. The recombinant expression vector of claim 5 , wherein said isolated nucleic acid molecule 
encodes the amino acid sequence of SEQ ID NO:2. 

7. The recombinant expression vector of claim 6, wherein said isolated nucleic acid molecule 
comprises the nucleotide sequence of SEQ ID NO: 1. 

8. A host cell comprising the recombinant expression vector of claim 5. 
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X. CONCLUSION 

Appellants respectfully submit that, in light of the foregoing arguments, the Final Action's conclusion 
that claims 1-8 lack a patentable utility and are unusable by the skilled artisan due to a lack of patentable 
utility, that claim 2 is indefinite, and that claims 1 , 5 and 8 lack sufficient written description and are not 
enabled, are unwarranted. It is therefore requested that the Board overturn the Final Action' s rejections. 

Respectfully submitted, 



David W. Hibler Reg. No. 41,071 

Agent For Appellants 

LEXICON GENETICS INCORPORATED 
8800 Technology Forest Place 
The Woodlands, TX 77381 
(281) 863-3399 
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Sbjct : 35041 tctctagtgtgtgggcaacctgtatactccagccgcgttgtaggtggccaggatgctgct 35100 
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THE HUMAN GENOME 

A 2.91-bitlion base pair (bp) consensusS|ence of the euchromatic portion , of 
the human genome was generated by the whole-genome shotgun sequencing 
method. The 14.8-billion bp DNA sequence was generated over 9 months from 
27 271,853 high-quality sequence reads (5.11-fold coverage of the genome) 
from .both ends of plasmid clones made from the DNA of five individuals. Two 
assembly strategies-a whole-genome assembly and a regional chromosome 
assembly-were used, each combining sequence data from ^Celera and the 
publicly funded genome effort. The public data were ^redded into 550-bp 
segments to create a 2.9-fold coverage of those genome reg.ons that had been 
sequenced, without including biases inherent in the cloning and assemWy 
procedure used by the publicly funded group. This brought the effective cov- 
erage in the assemblies to eightfold, reducing the number and size of gaps in 
the final assembly over what would be obtained with 5. 11 -fold coverage. The 
two assembly strategies yielded very similar results that largely agree j.th 
independent mapping data. The assemblies effectively cover the euchromatic 
eg^ns of the human chromosomes. More than 90% of the genome is in 
cfffold assemblies of 100,000 bp or more, and 25% of the genome is in 
scaffolds of 10 million bp or larger. Analysis of the genome sequence needed 
26 588 protein-encoding transcripts for which there was strong corroborat.ng 
evidenced an additional -12^ 

matches or other weak supporting evidence. Although gene-dense dusters are 
obvious, almost half the genes are dispersed in low C| C 
by large tracts of apparently noncoding sequence. Only 1.1% of the genome 
^spanned by exons, whereas 24% is in introns, with 75% of the genome being 
intergenic DNA. Duplications of segmental blocks, ranging in sue up to chro- 
mosomal lengths, are abundant throughout the genome and reveal • . oomptec 
evolutionary history. Comparative genomic analysis indicates vertebrate ex- 
pansions of genes associated with neuronal function with 
velopmental regulation, and with the hemostasis and immune systems. DNA 
sequence comparisons between the consensus sequence and 
genome data provided locations of 2.1 million single-nucleot.de polymorph.sms 
(SNPs). A random pair of human haploid genomes differed at a rate of 1 bp per 
1250 on average, but there was marked heterogeneity in the evel of poly- 
morphism acrofs the genome. Less than 1% of all SNPs resulted - J™^ 
proteins, but the task of determining which SNPs have functional consequences 
remains an open challenge. 



M^feng chain-terminating nucleotide ana- 



Decoding of the DNA that constitutes the 
human genome has been widely anticipated 
for the contribution it will make toward un- 
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derstanding human evolution, the causation 
of disease, and the interplay between the 
environment and heredity in defining the hu- 
man condition. A project with the goal of 
detemiining the complete nucleotide se- 
quence of the human genome was first for- 
mally proposed in 1985 (1). In subsequent 
years, the idea met with mixed reactions in 
the scientific community (2). However, in 
1990, the Human Genome Project (HGP) was 
officially initiated in the United States under 
the direction of the National Institutes of 
Health and the U.S. Department of Energy 
with a 15-year, $3 billion plan for completing 
the genome sequence. In 1998 we announced 
our intention to build a unique genome- 
sequencing facility, to determine the se- 
quence of the human genome over a 3-year 
period. Here we report the penultimate mile- 
stone along the path toward that goal, a nearly 
complete sequence of the euchromatic por- 
tion of the human genome. The sequencing 
was performed by a whole-genome random 
shotgun method with subsequent assembly of 
the sequenced segments. 

The modem history of DNA sequencing 
began in 1977, when Sanger reported his meth- 
od for determining the order of nucleotides of 
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logs^lr In the same year, the first human gene 
was isolated and sequenced (4). In 1986, Hood 
and co-workers (5) described an improvement 
in the Sanger sequencing method that included 
attaching fluorescent dyes to the nucleotides, 
which permitted them to be sequentially read 
by a computer. The first automated DNA se- 
quencer, developed by Applied Biosystems in 
California in 1987, was shown to be successful 
when the sequences of two genes were obtained ;, 
with this new technology (6). From early se- 
quencing of human genomic regions (7), it 
became clear that cDNA sequences (which are 
reverse-transcribed from RNA) would be es- 
sential to annotate and validate gene predictions 
in the human genome. These studies were the 
basis in part for the development of the ex- 
pressed sequence tag (EST) method of gene 
identification (8), which is a random selection, 
very high throughput sequencing approach to 
characterize cDNA libraries. The EST method 
led to the rapid discovery and mapping of hu- 
man genes (9). The increasing numbers of hu- 
man EST sequences necessitated the develop- 
ment of new computer algorithms to analyze 
large amounts of sequence data, and in 1 993 at 
The Institute for Genomic Research (TIGR), an 
algorithm was developed that permitted assem- 
bly and analysis of hundreds of thousands of 
ESTs. This algorithm permitted characteriza- 
tion and annotation of human genes on the basis 
of 30,000 EST assemblies (10). 

The complete 49-kbp bacteriophage lamb- 
da genome sequence was determined by a 
shotgun restriction digest method in 1982 
(1 1), When considering methods for sequenc- 
ing the smallpox virus genome in 1991 (12), 
a whole-genome shotgun sequencing method 
was discussed and subsequently rejected ow- 
ing to the lack of appropriate software tools 
for genome assembly. However, in 1994, 
when a microbial genome-sequencing project 
was contemplated at TIGR, a whole-genome 
shotgun sequencing approach was considered 
possible with the TIGR EST assembly algo- 
rithm. In 1995, the 1.8-Mbp Haemophilus 
influenzae genome was completed by a 
whole-genome shotgun sequencing method 
(13). The experience with several subsequent 
genome-sequencing efforts, established the 
broad applicability of this approach (14 y 15). 

A key feature of the sequencing approach 
used for these megabase-size and larger ge- 
. nomes was the use of paired-end sequences 
(also called mate pairs), derived from sub- 
clone libraries with distinct insert sizes and 
cloning characteristics. Paired-end sequences 
are sequences 500 to 600 bp in length from 
both ends of double-stranded DNA clones of 
prescribed lengths. The success of using end 
sequences from long segments (18 to 20 kbp) 
of DNA cloned into bacteriophage lambda in 
assembly of the microbial genomes led to the 
suggestion (16) of an approach to simulta- 
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neously map and sequence the human ge- 
nome by means of end sequences from 150- 
kbp bacterial artificial chromosomes (BACs) 
(17, 18). The end sequences spanned by 
known distances provide long-range continu- 
ity across the genome. A modification of the 
BAC end-sequencing (BES) method was ap- 
plied successfully to complete chromosome 2 
from the Arabidopsis thaliana genome (19). 

In 1997, Weber and Myers (20) proposed 
whole-genome shotgun sequencing of the 
human genome. Their proposal was not well 
received (21). However, by early 1998, as 
less than 5% of the genome had been se- 
quenced, it was clear that the rate of progress 
in human genome sequencing worldwide 
was very slow (22), and the prospects for 
finishing the genome by the 2005 goal were 
uncertain. 

In early 1998, PE Biosystems (now Applied 
Biosystems) developed an automated, high- 
throughput capillary DNA sequencer, subse- 
quently called the ABI PRISM 3700 DNA- 
Analyzer. Discussions between PE Biosystems 
and TIGR scientists resulted in a plan to under- 
take the sequencing of the human genome with . 
the 3700 DNA Analyzer and the whole-genome 
shotgun sequencing techniques developed at 
TIGR (23). Many of the principles of operation 
of a genome-sequencing facility were estab- 
, lished in the TIGR facility (24). However, the 
facility envisioned for Celera would have a 
capacity roughly 50 times that of TIGR, and 
thus new developments were required for sam- 
ple preparation and tracking and for whole- 
genome assembly. Some argued that the re- 
quired 150-fold scale-up from the H. influenzae 
genome to the human genome with its complex 
repeat sequences was not feasible (25). The 
Drosophila melanogaster genome was thus 
chosen as a test case for whole-genome assem- 
bly on a large and complex eukaryotic genome. 
In collaboration with Gerald Rubin and the 
Berkeley Drosophila Genome Project, the nu- 
cleotide sequence of the 120-Mbp euchromatic 
portion of the Drosophila genome was deter- 
mined over a 1-year period (26-28). The Dro- 
sophila genome-sequencing effort resulted in 
two key findings: (i) that the assembly algo- 
rithms could generate chromosome assemblies 
with highly accurate order and orientation with 
substantially less than 10-fold coverage, and (ii) 
that undertaking multiple interim assemblies in 
place of one comprehensive final assembly was 
not of value. 1 
These findings, together with the dramatic 2 
changes in the public genome effort subsequent 
to the formation of Celera (29), led to a modi- 3 
fied whole-genome shotgun sequencing ap- 4 
proach to the human genome. We initially pro- 5 
posed to do 10-fold sequence coverage of the 6 
genome over a 3-year period and to make in- 
terim assembled sequence data available quar- 7 
terly. The modifications included a plan to per- 
form random shotgun sequencing to — 5-fold 8 
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coverage and to use the unordered and unori- 
ented BAC sequence fragments and subassem- 
blies published in GenBank by the publicly 
funded genome effort (30) to accelerate the 
project. We also abandoned the quarterly an- 
nouncements in the absence of interim assem- 
blies to report. 

* Although this strategy provided a reason- 
able result very early that was consistent with a 
whole-genome, shotgun assembly with eight- 
fold coverage, the human genome sequence is 
not as finished as the Drosophila genome was 
with an effective 13-fold coverage. However, it 
became clear that even with this reduced cov- 
erage strategy, Celera could generate an accu- 
rately ordered and oriented scaffold sequence of 
the human genome in less than 1 year. Human 
genome sequencing was initiated 8 September 
1999 and completed 17 June 2000. The first 
assembly was completed 25 June 2000, and the 
assembly reported here was completed 1 Octo- 
, ber 2000. Here we describe the whole-genome 
random shotgun sequencing effort applied to 
the human genome. We developed two differ- 
. ent assembly approaches for assembling the —3 
. billion bp that make up the 23 pairs of chromo- 
somes of the Homo sapiens genome. Any Gen- 
Bank-derived data were shredded to remove 
potential bias to the final sequence from chi- . 
meric clones, foreign DNA contarnination, or , 
misassembled contigs. Insofar as a correctly - 
and accurately : assembled genome sequence 
with faithful order and orientation of contigs 
is essential for an accurate analysis of the 
human genetic code, we have devoted a con- 
siderable portion of this manuscript to the 
documentation of the quality of our recon- 
struction of the genome. We also describe our 
preliminary analysis of the human genetic 
code on the basis of computational methods. 
Figure 1 (see fold-out chart associated with . 
this issue; files for each chromosome can be 
found in Web fig. 1 on Science Online at 
www.sciencemag.org/cgi/content/full/291/ 
5507/1 3 04/DC1) provides a graphical over- 
view of the genome and the features encoded 
in it. The detailed manual curation and inter- 
pretation of the genome are just beginning. 

To aid the reader in locating specific an- 
alytical sections, we have divided the paper 
into seven broad sections. A summary of the 
major results appears at the beginning of each 
section. 



Sources of DNA and Sequencing Methods 
Genome Assembly Strategy and 
Characterization 
Gene Prediction and Annotation 
Genome Structure 
Genome Evolution 
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Coding Genes in the Human Genome 
Conclusions 



^Sources of DNA and Sequencing 

Summary. This section discusses the ratio 
and ethical rules governing donor sci^tic^ 
ensure ethnic and gender diversity atom, wi* 
.the methodologies for DNA extraction and IL 
brary construction. The plasmid library Co T 
struction is the first critical step in shotam 
sequencing. If the DNA libraries are not unU 
form in size, nonchimeric, and do not randomly 
• represent the genome, then the subsequent stem 
cannot accurately reconstruct the genome sc. 
quence. We used automated high-throughput 
DNA sequencing and the computation;!! infra* 
structure .to enable efficient tracking of cmon 
mous amounts of sequence information (27.3 
million sequence reads; 14.9 billion bp of sc- 
quence). Sequencing and tracking from both 
ends of plasmid clones from 2-, 10-, and 50-kbp 
libraries were essential to the computational 
reconstruction of the genome. Our evidence 
indicates that the accurate pairing rate of end 
sequences was greater than 98%. 



Various policies of the United States and the 
■■• eWorld Medical Association, specifically the 
Declaration of Helsinki, offer recommenda- 
tions for conducting experiments with human 
. subjects. We convened, an Institutional Rc- 

- -view Board (1RB) (31) that helped us estab- 

- lish the protocol for obtaining and using hu- 
man DNA and the informed consent process 
used to enroll research volunteers for the 
DNA-sequencing studies reported here. We 
adopted several steps and procedures to pro- 
tect the privacy rights and confidentiality of 
the research subjects (donors). These includ- 
ed a two-stage consent process, a secure ran- 
dom alphanumeric coding system for speci- 

. mens and records, circumscribed contact with 
the subjects by researchers, and options for 
off-site contact of donors. In addition, Celera 
applied for and received a Certificate of Con- 
fidentiality from the Department of Health 
and Human Services. This Certificate autho- 
rized Celera to protect the privacy of the 
individuals who volunteered to be donors as 
provided in Section 301(d) of the Public 
Health Service Act 42 U.S.C. 241(d). 

Celera and the IRB believed that the ini- 
tial version of a completed human genome 
should be a composite derived from multiple 
donors of diverse ethnic backgrounds Pro- 
spective donors were asked, on a voluntary 
basis, to self-designate an ethnogeographic 
category (e.g., African-American, Chinese, 
Hispanic, Caucasian, etc.). We enrolled 21 
donors (32). 

Three basic items of information from 
each donor were recorded and linked by con- 
fidential code to the donated sample: age, 
sex, and. self-designated ethnogeographic 
group. From females, -130 ml of whole, 
heparinized blood was collected. From males, 
—130 ml of whole, heparinized blood was 
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collected, as well as five specimens of semen, 
collected over a 6-week period. Permanent 
Iymphoblastoid cell lines were created by 
Epstein-Barr virus immortalization. DNA 
from five subjects was selected for genomic 
DNA sequencing: two males and three fe- 
males — one African-American, one Asian- 
Chinese, one Hispanic-Mexican, and two 
Caucasians (see Web fig. 2 on Science Online 
at www.sciencemag.org/cgi/content/291/5507/ 
1304/DC1). The decision of whose. DNA to 
. sequence was based on a complex mix of fac- 
tors, including the goal of achieving diversity as 
well as technical issues such as the quality of 
the DNA libraries and availability of immortal- 
ized cell lines. 

1.1 Library construction and 
sequencing 

Central to the whole-genome shotgun sequenc- 
ing process is preparation of high-quality plas- 
mid libraries in a variety of insert sizes so that 
pairs of sequence reads (mates) are obtained, 
one read from both ends of each plasmid insert. 
High-quality libraries have an equal representa- 
tion of all parts of the genome, a small number 
of clones without inserts, and no contamination 
from such sources as the mitochondrial genome 
and Escherichia coli genomic DNA. DNA from 
each donor was used to construct plasmid librar- 
ies in one or more of three size classes: 2 kbp, 1 0 
kbp, and 50 kbp (Table 1) (33). 

In designing the DNA-sequencing pro- 
cess, we focused on developing a simple 
system that could be implemented in a robust 
and reproducible manner and monitored ef- 
fectively (Fig. 2) (34). 

Current sequencing protocols are based on 
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dideoxy sequencing method (55), which 
ically yields only 500 to 750 bp of sequence 
per reaction. This limitation on read length has 
made monumental gains in throughput a pre- 
requisite for the analysis of large eukaryotic 
genomes. We accomplished this at the Celera 
facility, which occupies about 30,000 square 
feet of laboratory space and produces sequence 
data continuously at a rate of 175,000 total 
reads per day. The DNA-sequencing facility is 
supported by a liigh-perfonnance computation- 
al facility (36). 

. : The process for DNA sequencing was mod- 
ular by design and automated. Intermodule 
sample backlogs allowed four principal 
modules to operate independently: (i) li- 
brary transformation, plating, and colony 
picking; (ii) DNA template preparation; 
(iii) dideoxy sequencing reaction set-up 
and purification; and (iv) sequence deter-* 
mination with the ABI PRISM 3700 DNA 
Analyzer. Because the inputs and outputs 
of each module have been carefully 
matched and sample backlogs are continu- 
ously managed, sequencing has proceeded 
without a single day's interruption since the 
initiation of the Drosophila project in May 
1999. The 'ABI 3700 is a fully automated 
capillary array sequencer and as such can 
be operated with a minimal amount of 
hands-on time, currently estimated at about 
15 min per day. The capillary system also 
facilitates correct associations of sequenc- 
ing traces with samples through the elimi- 
nation of manual sample loading and lane- 
tracking errors associated with slab gels; 
About 65 production staff were hired and 
trained, and were rotated on a regular basis 



thrJK the four production modules. A 
cerSBrlaboratory information management 
system (LIMS) tracked all sample plates by 
unique bar code identifiers. The facility was 

. supported by a quality control team that per- 
formed raw material and in-process testing 
and a quality assurance group with responsi- 
bilities including document control, valida- 
tion, and auditing of the facility. Critical to 
the success of the scale-up was the validation 
of all software and instrumentation before 

1 implementation, and production-scale testing 

•of any process changes. 

1.2 Trace processing 

An automated trace-processing pipeline has 
been developed to process each sequence file 
(57). After quality and vector trimming, the 
average trimmed sequence length was 543 
bp, and the sequencing accuracy was expo- 
nentially distributed with a mean of 99.5% 
and with less than 1 in 1000 reads being less 
than 98% accurate (26). Each trimmed se- 
quence was screened for matches to contam- 
inants including sequences of vector alone, E. 
coli genomic DNA, and human mitochondri- 
al DNA. The entire read for any sequence 
with a significant match to a contaminant was 

. discarded. A total of 713 reads matched E. 
coli genomic DNA and 21 14 reads matched 

: the human mitochondrial genome. 

13 Quality assessment and control 

The importance of the base-pair level ac- 
curacy of the sequence data increases as the 
size and. repetitive nature of the.genome to 
be sequenced increases. Each sequence 
read must be 'placed uniquely in the ge- 



Table 1. Celera-generated data input into assembly. 





Individual 




Number of reads for different insert libraries 




Total number of 




2 kbp 


10 kbp 


50 kbp 


Total 


base pairs 


No. of sequencing reads 


A 
B 
C 
D 
F 

Total 


0 

11,736,757 
853,819 
952,523 
0 

13,543,099 


0 

7.467,755 
881,290 
1,046,815 
1,498,607 
10,894,467 


2,767,357 
66,930 
0 
0 
0 

2,834,287 


2,767,357 
19,271,442 
1,735,109 
1,999,338 
1,498,607 
27,271,853 


1,502,674,851 
10,464,393,006 
942,164,187 
1,085.640,534 
813.743,601 
14,808.616,179 


Fold sequence. coverage 
(2.9-Gb genome) 


A 
B 
C 
D 
F 

Total 


0 

2.20 
0.16 
0.18 
0 

2.54 


. 0 
1.40 : 
1,17 
0.20 
. 0.28 
2.04 


0.52 
0.01 
0 
0 
0 

0.53 


0.52 
. 3.61 
0.32 
0.37 
0.28 
5.11 




Fold done coverage 


A 
B 
C 
D 
F 

Total 


0 

2.96 
0.22 
0.24 
0 

3.42 


0 

11.26 
1.33 
1.58 
2.26 

16.43 


1839 
0.44 
0 
0 
0 

18.84 


18.39 
14.67 
1.54 
1.82 
2.26 
38.68 




Insert size* (mean) 
Insert size* (SD) 
% Matest 


Average 
Average 
Average 


1,951 bp 
6.10% 
74.50 


10,800 bp 
8.10% 
80.80 


50,715 bp 
14.90% 
75.60 






♦Insert size and SD are calculated from assembly of mates on contigs. t% Mates Is based on laboratory tracking of sequencing 


runs. 
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nome, and even a modest error rate can 
reduce the effectiveness of assembly. In 
addition, maintaining the validity of mate- 
pair information is absolutely critical for 
the algorithms described below. Procedural 
controls were established for maintaining 
the validity of sequence mate-pairs as se- 
quencing reactions proceeded through the 
process, including strict rules built into the 
LIMS. The accuracy of sequence data pro- 
duced by the Celera process was validated 
in the course of the Drosophila genome 
project (26). By collecting data for the 



entire human genome in a single facility,- 
we were able to ensure uniform quality 
standards and the cost advantages associat- 
ed with automation, an economy of scale, 
and process consistency. 

2 Genome Assembly Strategy and 
Characterization 

Summary, We describe in this section the two 
. approaches that we used to assemble the ge- 
nome. One method involves the computational - 
combination of all sequence reads with shred- 
ded data from GenBank to generate an indepen- 



dent, nonbiased view of the genome. The sec- 
ond approach involves clustering all of the frag- 
ments to a region or chromosome on the basis 
of mapping information. The clustered data 
were then shredded and subjected to computa- 
tional assembly. Both approaches provided es- 

- sentially the same reconstruction of assembled 
DNA sequence with proper order and orienta- 

,;tion. The. second method provided slightly 

' greater sequence coverage (fewer gaps) and 
was the principal sequence used for the analysis 

. phase. In addition, we document the complete- 
ness and correctness of this assembly process 
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dures, with a focus on quality within and across departments. Each 
process has defined inputs and outputs with the capability to exchange 



Assemblies 

PR/CT] 



samples and data with both internal and external entities according to 
defined quality guidelines. Manufacturing pipeline processes, products, 
quality control measures, and responsible parties are indicated and are 
described further in the text. 
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and provide a comparison to the public genome 
sequence, which was reconstructed largely by 
an independent BAC-by-BAC approach. Our 
assemblies effectively covered the euchromatic 
regions of the human chromosomes. More than 
90% of the genome was in scaffold assemblies 
of 100,000 bp or greater, and 25% of the ge- 
nome was in scaffolds of 10 million bp or 
larger. 

Shotgun sequence assembly is a classic 
example of an inverse problem: given a set 
of reads randomly sampled , from a target 
sequence, reconstruct the order and the po- 
sition of those reads in the target. Genome 
assembly algorithms developed for Dro- 
sophila have now been extended to assemble 
the —25-fold larger human genome. Celera as- 
semblies consist of a set of contigs that are 
ordered and oriented into scaffolds that are then 
mapped to chromosomal locations by using 
known markers. The contigs consist of a col- 
lection of overlapping sequence reads that pro- 
vide a consensus reconstruction for a contigu- 
ous interval of the genome. Mate pairs are a 
central component of the assembly strategy. 
They are used to produce scaffolds in which the 
size of gaps between consecutive contigs is 
known with reasonable precision. This is ac- 
complished by observing that a pair of reads, 
one of which is in one contig, and the other of 
which is in another, implies an orientation and 
distance between the two contigs (Fig. 3). Fi- 
nally, our assemblies did not incorporate all 
reads into the final set of reported scaffolds. 
This set of unincorporated reads is termed 
"chaff," and typically consisted of reads from 
within highly repetitive regions, data from other 
organisms introduced through various routes as 
found in many genome projects, and data of 
poor quality or with untrimmed vector. 
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.1 Assembly data sets 

re used two independent sets of data for our 
assemblies. The first was a random shotgun 
data set of 27.27 million reads of average length 
543 bp produced at Celera. This consisted 
largely of mate-pair reads from 16 libraries 
constructed from DNA samples taken from five 
. different donors. Libraries with insert sizes of 2, 
10, and 50 kbp were used By looking at how 
mate pairs from a library were positioned in 
known sequenced stretches of the genome, we 
were able to characterize the. range of insert 
V; sizes in each library and determine a mean and 
standard deviation. Table 1 details the number . 
. of reads, sequencing coverage, and clone cov- 
erage achieved by the data set The clone cov- 
erage is the coverage of the genome in cloned 
DNA, considering the entire insert of each 
clone that has sequence from both ends. The 
clone coverage provides a measure of the 
amount of physical DNA coverage of the ge- 
nome. Assuming a genome size of 2.9 Gbp, the 
Celera trimmed sequences gave a 5. IX cover- 
age of the genome, and clone coverage was 
3.42X, 16.40X, and 18.84X for the 2-, 10-, and 
50-kbp libraries, respectively, for a total of 
38.7X clone coverage. 

The second data set was from the publicly 
funded Human Genome Project (PFP) and is 
primarily derived from BAC clones (30). The 
B AC data input to the assemblies came from a 
: download of GenBank on 1 September 2000 
(Table 2) totaling 4443.3 Mbp of sequence. 
. The data for each BAC is deposited at one of 
- four levels of completion- Phase 0 data are a set 
of 1 generally unassembled sequencing reads 1 
u from a very light shotgun of the BAC, typically 
less than IX. Phase 1 data are unordered as- 
semblies of contigs, which we call BAC contigs 
or bactigs. Phase 2 data are ordered assemblies 
of bactigs. Phase 3 data are complete BAC 
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Fig. 3. Anatomy of whole-genome assembly. Overlapping shredded bactig fragments (red lines) and 
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contig and a consensus sequence (green line). Contigs are connected into scaffolds (red) by using 
mate pair information. Scaffolds are then mapped to the genome (gray line) with STS (blue star) 
physical map information. 



sequAes. In the past 2 years the PFP has 
focu^P&n a product of lower quality and com- 
pleteness, but on a faster time-course, by con- 
centrating on the production of Phase 1 data 
from a 3X to 4X light-shotgun of each BAC 
clone. 

We screened the bactig sequences for con- 
taminants by using the BLAST algorithm 
against three data sets: (i) vector sequences 
in Univec core (38), filtered for a 2 5 -bp 
match at 98% sequence identity at the ends 
of the sequence and a 3 0-bp match internal 
to the sequence ; * (ii) the nonhuman portion . 
of the High Throughput Genomic (HTG) 
Seqences division of GenBank (39), fil- 
tered at 200 bp at 98%; and (iii) the non- 
redundant nucleotide sequences from Gen- 
Bank without primate and human virus en- 
tries, filtered at 200 bp at 98%. Whenever 
25 bp or more of vector was found within 
50 bp of the end of a contig, the tip up to 
the matching vector was excised. Under, 
these criteria we removed 2.6 Mbp of pos- 
sible contaminant and vector from the 
Phase 3 data, 61.0 Mbp from the Phase 1 
and 2 data, and 16.1 Mbp from the Phase 0 
data (Table 2). This left us with a total of 
4363.7 Mbp of PFP sequence data 20% 
finished, 75% rough-draft (Phase 1 and 2), 
and 5% single sequencing reads (Phase 0). 
An additional 104,018 BAC end-sequence 
mate pairs were also downloaded and in- 
cluded in the data sets for both assembly 
processes (18). 

. 2.2 Assembly strategies ' . 

Two different approaches to assembly were 
pursued. The first was a whole-genome as- 
sembly process that used Celera data and the 
PFP data in the form of additional synthetic 
shotgun data, and the second was a compart- 
mentalized assembly process that first parti- 
tioned the Celera and PFP data into sets 
localized to large chromosomal segments and 
then performed ab initio shotgun assembly on 
each set. Figure 4 gives a schematic of the 
overall process flow. 

For the whole-genome assembly, the PFP 
data was first disassembled or "shredded" into a 
synthetic shotgun data set of 550-bp reads that 
form a perfect 2 X covering ofethe bactigs. This 
resulted in 16.05 miUion "faux" reads that were 
sufficient to cover the genome 2.96 X because 
of redundancy in the BAC data set, without 
incorporating the biases inherent in the PFP 
assembly process. The combined data set of 
43.32 million reads (8X), and all associated 
mate-pair information, were then subjected to 
our whole-genome assembly algorithm to pro- 
duce a reconstruction of the genome. Neither 
the location of a BAC in the genome nor its 
assembly of bactigs was used in this process. 
Bactigs were shredded into reads because we 
found strong evidence that 2.13% of them were 
misassembled (40). Furthermore, BAC location 
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information was ignored because some BACs 
were not correctly placed on the PFP physical 
map and because we found strong evidence that 

Table 2. GenBank data input into assembly. 
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at least 2.2% of the BACs contained sequence 
data that were not part of the given BAC (41), 
possibly as a result of sample-tracking errors 



Center 



Statistics 



Completion phase sequence 



Whitehead Institute/ 
MJT Center for 
Genome Research, 
USA 



Washington University, 
USA 



Baylor College of 
Medicine, USA 



Production Sequencing 
Facility, DOE Joint 
Genome Institute, 
USA 



The Institute of Physical 
and Chemical 
Research (RIKEN), 
Japan 



Sanger Centre, UK 



Others* 



All centers combinedf 



Number of accession records 
. Number of contigs 
Total base pairs 
Total vector masked (bp) 
Total contaminant masked 
(bp) 

Average contig length (bp) 

Number of accession records 
Number of contigs 
Total base pairs 
Total vector masked (bp) 
Total contaminant masked 
. M 

Average contig length (bp) 

Number of accession records 
Number of contigs 
Total base pairs 
Total vector masked (bp) 
Total contaminant masked 
(bp) 

Average contig length (bp) 

Number of accession records 
Number of contigs 
'Total base pairs . 
Total vector masked (bp) 
Total contaminant masked 
(bp) 

Average contig length (bp) 

Number of accession records 

Number of contigs 

Total base pairs 

Total vector masked (bp) 

Total contaminant masked (bp) 

Average contig length (bp) 

Number of accession records 

Number of contigs 

Total base pairs 

Total vector masked (bp) 

Tdtal contaminant masked (bp) 

Average contig length (bp) 

Number of accession records 
Number of contigs 
Total base pairs 
Total vector masked (bp) 
Total contaminant masked 
(bp) 

Average contig length (bp) 

Number of accession records 
Number of contigs 
Total base pairs . 
Total vector masked (bp) 
Total contaminant masked 
(bp) 

Average contig length (bp) 
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•Other centers contributing at least 0.1% of the sequence include: Chinese National Human Genome Center; 
Genomanatyse GeseUschaft fuer Biotechnologische Forschung mbH; Genome Therapeutics Corporation; GENOSCOPE; 
Chinese Academy of Sciences; Institute of Molecular Biotechnology; Keio University School of Medicine; Lawrence 
Uvermore National Laboratory; Cold Spring Harbor Laboratory; Los Alamos National Laboratory; Max-Planck Institut fuer 
Molekulare, Genetik; Japan Science and Technology Corporation; Stanford University; The Institute for Genomic 
Research; The Institute of Physical and Chemical Research, Gene Bank; The University of Oklahoma; University of Texas 
Southwestern Medical Center, University of Washington. fThe 4,405,700.825 bases contributed by all centers were 
shredded into faux reads resulting In 2.96X coverage of the genome. 



(see below). In short, we performed a true, ab 
initio whole-genome assembly in which ut 
took the expedient of deriving additional sc. 
quence coverage, but not mate pairs, assembled 
bacrjgs, or genome locality, from some exter- 
nally generated data. 

• In the compartmentalized shotgun assembly 
(CSA), Celera and PFP data were partitioned 
into the largest possible chromosomal segment! 
or "components'* that could be determined with 
confidence,and then shotgun assembly was ap- 
plied to each partitioned subset wherein the 
bactig data were again shredded into faux rcadi 
to ensure an independent ab initio assembly of 
the component By subsetting the data in this 

: way, the overall computational effort was re- 
duced and the effect of interchromosomal dupli- 
cations was ameliorated. This also resulted in o 
reconstruction of the genome that was relatively 
independent of the whole-genome assembly re- 
sults so that the two assemblies could be com- 
pared for consistency. The quality of the parti- 

■ tioning into components .was .crucial so that 
different genome regions were not mixed to- 
gether. We constructed components from (i) the 
longest scaffolds of the sequence from each 
BAC and (ii) assembled scaffolds of data unique 
to Celera's data set. The BAC assemblies were 
obtained by a combining assembler that used the 
bactigs and the 5 X Celera data mapped to those 

. bactigs as input This effort was undertaken as 
an mterim step solely because the more accurate 
and complete the scaffold for a given sequence 
stretch, the more accurately one can tile those 
scaffolds into contiguous components on the 
basis of sequence overlap and mate-pair infor- 
mation. We further visually inspected and cu- 
rated the scaffold tiling of the components to 
further increase its accuracy. For the final CSA 
assembly, all but the partitioning was ignored, 
and an independent, ab initio reconstruction of 
the sequence in each component was obtained 
by applying our whole-genome assembly algo- 
rithm to the partitioned, relevant Celera data and 
the shredded, faux reads of the partitioned, rel- 
evant bactig data. 

2.3 Whole-genorfie assembly 

The algorithms used for whole-genome as- 
sembly (WGA) of the human genome were 
enhancements to those used to produce the 
sequence of tie Drosophila genome reported 
in detail in (28). t 

The WGA assembler consists of a pipc |int 
composed of five principal stages: Screencr. 
Overlapper, UnirJgger, ScarTolder, and Rcpea 
Resolver, respectively. The Screencr finds 
and marks all microsatellite repeats with k'* s 
than a 6-bp element, and screens out uH 
known interspersed repeat elements, inc, " J ] 
ing Alu, Line, and ribosomal DNA. Murkc« 
regions get searched for overlaps, wherca* 
screened regions do not get searched, but can 
be part of an overlap that involves unscreened 
matching segments. 
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The Overlapper compares every read 
against every other read in search of complete 
end-to-end overlaps of at least 40 bp and with 
no more than 6% differences in the match. 
Because all data are scrupulously vector- 
trimmed, the Overlapper can insist on com- 
plete overlap matches. Computing the set of 
all overlaps took roughly 10,000 CPU hours 
with a suite of four-processor Alpha SMPs 
with 4 gigabytes of RAM. This took 4 to 5 
days in elapsed time with 40 such machines 
operating m parallel. 

Every overlap computed above is statisti- 
cally a l-in-10 17 event and thus not a coinci- 
dental event. What makes assembly combi- 
natorially difficult is that while many over-, 
laps are actually sampled from overlapping 
regions of the genome, and thus imply that 
the sequence reads should be assembled to- 
gether, even more overlaps are actually from 
two distinct copies of a low-copy repeated 
element not screened above, thus constituting 
an error if put together. We call the former 
"true overlaps" and the latter "repeat-induced 
overlaps." The assembler must avoid choos- 
ing repeat-induced overlaps, especially early 
in the process. 

We achieve this objective in the Unitig- 
ger. We first find all assemblies of reads that 
appear to be uncontested with respect to all 
other reads. We call the contigs formed from 
these subassemblies unitigs (for uniquely as- 
sembled co ntigs) . Formally, these unitigs are 
the uncontested interval subgraphs of . the 
graph of all overlaps (42). Unfortunately, al- 
though empirically many of these assemblies 
are correct (and thus involve only true over- 
laps), some are in fact collections of reads 
from several copies of a repetitive element 
that have been overcollapsed into a single 
subassembly. However, the overcollapsed 
unitigs are easily identified because their av- 
erage coverage depth is too high to be con- 
sistent with the overall level of sequence 
coverage. We developed a simple statistical 
discriminator that gives the logarithm of the 
odds ratio that a unitig is composed of unique 
DNA or of a repeat consisting of two or more 
copies. The flscrirninator, set to a sufficiently 
stringent threshold, identifies a subset of the 
unitigs that we are certain are correct. In 
addition, a second, less stringent threshold 
identifies a subset of remaining unitigs very 
likely to be correctly assembled, of which we 
select those that will consistently scaffold 
(see below), and thus are again almost certain 
to be correct. We call the union of these two 
sets U-unitigs. Empirically, we found from a 
6X simulated shotgun of human chromosome 
22 that we get U-unitigs covering 98% of the 
stretches of unique DNA that are >2 kbp 
long. We are further able to identify the 
boundary of the start of a repetitive element 
at the ends of a U-unitig and leverage this so 
that U-unitigs span more than 93% of all 
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ngly interspersed Alu elements and other 
100-to 400-bp repetitive segments. 

The result of running the Unitigger was 
thus a set of correctly assembled subcontigs 
covering an estimated 73.6% of the human 
genome. The Scaffolder then proceeded to 
use mate-pair information to link these to- 
gether into scaffolds. When there are two or 
more mate pairs that imply that a given pair 
of U-unitigs are at a certain distance and 
Orientation, with respect to each other, the , 
probability • of this being wrong is again . 
roughly .1 in . 10 1 ?, assuming that mate pairs 
are false less than 2% of the time. Thus, one 
can with high confidence link together all 
U-unitigs that are linked by at least two 2- or 
10-kbp mate pairs producing intermediate- . 
sized scaffolds that are then recursively : 
linked together by. confirming 50-kbp mate 
pairs and BAC end sequences. This process 
yielded scaffolds that are on the order of 
megabase pairs in size with gaps between 
their contigs that generally correspond to re- 
petitive elements and occasionally to small . 
sequencing gaps. These scaffolds reconstruct 
the majority of the unique sequence within a ; . 
genome. 

. For the Drosophila assembly, we engaged 
in a three-stage repeat resolution strategy 
where each .stage .was progressively more , 



5.1 1X Cetera Reads 
39X mate pairs 



ag^^ive and thus more likely to make a 
mistake. For the human assembly, we contin- 
ued to use the first "Rocks" substage where 
all unitigs with a good, but not definitive, 
discriminator score are placed in a scaffold 
gap. This was done with the condition that 
two or more mate pairs with one of their 
reads already in the scaffold unambiguously 
place the unitig in the given gap. We estimate 
the. probability of inserting a unitig into an 
incorrect gap with this strategy to be less than 
10~ 7 based on a probabilistic analysis. 
' We revised the ensuing "Stones" substage 
• of the human assembly, making it more like 
the mechanism suggested in our earlier work 
(4 J). For each gap, every read R that is placed 
in the gap by virtue of its mated pair M being 
in a contig of the scaffold and implying R's 
placement is collected. Celera's mate-pairing 
information is correct more than 99% of the 
time. Thus, almost every, but not all, of the 
reads in the set belong in the gap, and when 
a read does not belong it rarely agrees with 
the remainder of the reads. Therefore, we 
simply assemble this set of reads within the 
gap, eliminating any reads that conflict with 
the assembly. This operation proved much 
more reliable than the one it replaced for the 
Drosophila assembly; in the assembly of a 
simulated shotgun data set of human chromo- 



Public Bactias 
(from 33.421 BACs) 




Bactigs & Cetera pairs 
(binned by BAC) 




Components t 




Components 2 
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WG A Assembly CS A Assembly 

Fig. 4. Architecture of Celera's two-pronged assembly strategy. Each oval denotes a computation 
process performing the function indicated by its label, with the labels on arcs between ovals 
describing the nature of the objects produced and/or consumed by a process. This figure 
summarizes the discussion in the text that defines the terms and phrases used. 
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some 22, all stones were placed correctly. 

The final method of resolving gaps is to 
fill them with assembled BAC data that cover 
the gap. We call this external gap talking." 
We did not include the very aggressive "Peb- 
bles" substage described in our Drosophila 
work, which made enough mistakes so as to 
produce repeat reconstructions for long inter- 
spersed elements whose quality was only 
99.62% correct. We decided that for the hu- 
man genome it was philosophically better not 
to introduce a step that was certain to produce 
less than 99.99% accuracy. The cost was a 
somewhat larger number of gaps of some- 
what larger size. 

At the final stage of the assembly process, 
and also at several intermediate points, a 
consensus sequence of every contig is pro- 
duced. Our algorithm is driven by the princi- 
ple of maximum parsimony, with quality- 
value-weighted measures for evaluating each 
base. The net effect is a Bayesian estimate of 
the correct base to report, at each position. 
Consensus generation uses Celera data when- 
ever it is present. In the event that no Celera 
data cover a given region, the BAC data , 
sequence is used. 

A key element of achieving a WGA of the 
human genome, was to parallelize the Overlap- 
per and the central consensus sequence-con- 
structing subroutines. In addition, memory was 
a real issue— a straightforward application of , 
the software we had built for Drosophila would 
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have required a computer with a 600-gigabyte 
RAM. By making the Overlapper and Unitigger 
incremental, we were able to achieve the same 
computation with a maximum of instantaneous 
usage of 28 gigabytes of RAM. Moreover, the 
incremental nature of the first three stages al- 
lowed us to continually update the state of this 
part of the computation as data were delivered 
, and then perform a 7-day run to complete Scaf- 
folding, and Repeat Resolution whenever de- 
sired For our assembly operations, the total 
compute infrastructure consists of 10 four-pro- 
cessor SMPs with 4 gigabytes of memory per 
cluster (Compaq's ES40, Regatta) and a 16- 
processor NUMA. machine with 64 gigabytes 
of memory (Compaq's GS160, Wildfire). The. 
total compute. for a run of the assembler was 
roughly 20,000 CPU hours. 

The assembly of Celera's data, together 
with the shredded bactig data, produced a set of 
scaffolds totaling 2.848 Gbp in span and con- 
sisting of 2.586 Gbp of sequence. The chaff, or 
set of reads not . incorporated in the; assembly, 
numbered 1 121 million (26%), which is con- 
sistent with our experience for Drosophila. 
More than 84% of the genome was covered by 
scaffolds > 1 00 kbp long, and these averaged 
91% sequence and 9% gaps with a total of 
2.297 Gbp of sequence. There were a total of 
93,857 gaps among the 1637. scaffolds >100 . 
kbp. Hie average scaffold . size was 1.5 Mbp,. 
the average contig size was 24.06 kbp, arid the . 
average gap size was 2.43 kbp, where the dis- 



- tnbunon of each was essentially exponentia 
More than 50% of all gaps were less than 50; 
bp long, >62% of all gaps were less than 1 kbi 
long, and no gap was >100 kbp long. Similar 
ly, more than 65% of the sequence is in contig; 
>30 kbp, more than 31% is in contigs >10C 
kbp, and the largest contig was 1 .22 Mbp long. 
Table 3 gives detailed summary statistics foi 

) the structure of this assembly with a direct 
comparison to the compartmentalized shotgun 
assembly. 

2.4 Compartmentalized shotgun 
assembly 

In addition to the WGA approach, we pur- 
sued a localized assembly approach that was 
intended to subdivide the genome into seg- 
ments, each of which could be shotgun as- 
sembled individually. We expected that this 
would help in resolution of large interchro- 
mosomal duplications and improve the statis- 
tics for calculating U-unitigs. The compart- 
mentalized assembly process involved clus- 
tering Celera reads and bactigs into large, 
multiple megabase regions of the genome,' 
and then running the WGA assembler on the 
Celera data and shredded, faux reads ob- 
tained from the bactig data. 

The first phase of the CSA strategy was to 
separate Celera reads into those that matched 
the BAC contigs for a particular PFP BAC 
entry, and those that did not match any public 
data. Such matches, must be guaranteed to 



Table 3. Scaffold statistics for whole-genome and compartmentalized shotgun assemblies. 



Scaffold size 



All 



>30 kbp 



>100 kbp 



>5O0 kbp 



>1000 kbp 



No. of bp in scaffolds 

(including intrascaffold gaps) 
No. of bp in contigs 
No. of scaffolds 
No. of contigs 
No. of gaps 
No. of gaps si kbp 
Average scaffold size (bp) 
Average contig size (bp) 
Average intrascaffold gap size 

(bp) 

Largest contig (bp) 
% of total contigs 

No. of bp in scaffolds 

(including intrascaffold gaps) 
No. of bp in contigs 
No. of scaffolds 
No. of contigs 
No. of gaps 
No. of gaps £1 kbp 
Average scaffold size (bp) 
Average contig size (bp) 
Average intrascaffold gap size 
(bp) 

Largest contig (bp) 
% of total contigs 



2,905,568,203 

2,653,979,733 
53,591 
170,033 
116,442 
72,091 
54,217 
15,609 
2,161 

1,988,321 
100 

2,847.890,390 

2.586,634,108 
118,968 
221,036 
102.068 
62,356 
23.938 
11,702 
2,560 

1,224,073 
100 



Compartmentalized shotgun assembly 

2748,892,430 2.700,489,906 



2,524,251,302 
2.845 
112,207 
109,362 
619,175 
966,219 
22,496 
2,054 

1,988,321 
95 

Whole-genome assembly 
2,574,792.618 

2,334,343,339 
2,507 
99,189 
96,682 
60,343 
1,027,041 
23,534 
2,487 

1.224.073 
90 



2,491,538,372 
1,935 
107,199 
105,264 
67.289 
1,395,602 
23,242 
1,985 

1.988,321 
94 

2,525,334,447 

2,297,678,935 
1.637 
95,494 
93,857 
59,156 
1.542,660 
24,061 
2.426 

1,224,073 
89 



2,489,357,260 

2.320,648,201 
1,060 
93,138 
92,078 
59,915 
2,348,450 
24,916 
1.832 

1.988.321 
87 

2.328,535,466 

2,143,002,184 
818 
84,641 
83,823 
54,079 
2.846,620 
25,319 
2,213 

1.224,073 
83 



2,248,689,128 

2,106,521.902 
721 
82,009 
81,288 
53,354 
3,118,848 
25,686 
1,749 

1,988,321 
79 

2,140,943,032 

1,983,305,432 
554 
76,285 
75,731 
49,592 
3,864,518 
25,999 
2,082 

1,224,073 
77 
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properly place a Celera read, so all reads were 
first masked against a library of common 
repetitive elements, and only matches of at 
least 40 bp to unmasked portions of the read 
constituted a hit. Of Celera's 27.27 million 
reads, 20.76 million matched a bactig and 
another 0.62 million reads, which did not 
have any matches, were nonetheless identi- 
fied as belonging in the region of the bactig's 
BAC because their mate matched the bactig. 
Of the remaining reads, 2.92 million were 
completely screened out and so could not be 
matched, but the other 2.97 million reads had 
unmasked sequence totaling 1.189 Gbp that 
were not found in the GenBank data set. 
Because the Celera data are 5. 1 1 X redundant, 
we estimate that 240 Mbp of unique Celera 
sequence is not in the GenBank data set. 

In the next step of the CSA process, a 
combining assembler took the relevant 5X 
Celera reads and bactigs for a BAC entry, and 
produced an assembly of the combined data 
for that locale. These high-quality sequence 
reconstructions were a transient result whose 
utility was simply to provide more reliable 
information for the purposes of their riling 
into sets of overlapping and adjacent scaffold . 
sequences in the next step. In outline, the 
combining assembler first examines the set of 
matching Celera reads to determine if there 
are excessive pileups indicative of un- 
screened repetitive elements. Wherever these 
occur, reads in the repeat region whose mates 
have not been mapped to consistent positions 
are removed. Then all sets of mate pairs that 
consistently imply the same relative position 
of two bactigs are bundled into a link and 
weighted according to the number of mates in 
the bundle. A "greedy" strategy then attempts 
to order the bactigs by selecting bundles of 
mate-pairs in order of their weight. A selected 
mate-pair bundle can tie together two forma- 
tive scaffolds. It is incorporated to form a 
single scaffold only if it is consistent with the 
majority of links between contigs of the scaf- 
fold. Once scaffolding is complete, gaps are 
filled by the "Stones" strategy described 
above for the WGA assembler. 

The GenBank data for the Phase 1 and 2 
BACs consisted of an average of 19.8 bactigs 
per BAC of average size 8099 bp. Applica- 
tion of the combining assembler resulted in 
individual Celera BAC assemblies being put 
together into an average of 1.83 scaffolds 
(median of 1 scaffold) consisting of an aver- 
age of 8.57 contigs of average size 18,973 bp. 
In addition to defining order and orientation 
of the sequence fragments, there were 57% 
fewer gaps in the combined result. For Phase 
0 data, the average GenBank entry consisted 
of 91.52 reads of average length 784 bp. 
Application of the combining assembler re- 
sulted in an average of 54.8 scaffolds consist- 
ing of an average of 58.1 contigs of average 
size 873 bp. Basically, some small amount of 



a^Mbly took place, but not enough Celera 
dSWere matched to truly assemble the 0.5 X 
to IX data, set represented by the typical 
Phase 0 BACs. The combining assembler 
was also applied to the Phase 3 BACs for 
SNP identification, confirmation of assem- 
bly, and localization of the Celera reads. The 
phase 0 data suggest that a combined whole- 
genome shotgun data set and IX light-shot- 
gun of BACs will not yield good assembly of 
BAC regions; at least 3 X light-shotgun of 
each BAC is needed. 

. The< 5.89 million Celera fragments not 
matching the GenBank data were assembled 
with our whole-genome assembler. The as- 
sembly resulted in a set of scaffolds totaling 
442 Mbp in span and consisting of 326 Mbp 
of sequence. More than 20% of the scaffolds 
were >5 kbp long, and these averaged 63% 
sequence and 27% gaps with a total of 302 
Mbp of sequence. All scaffolds >5 kbp were 
forwarded along with all scaffolds produced 
by the combining assembler to the subse- 
quent tiling phase. 

At this stage, we typically had one or two 
scaffolds for every BAC region constituting 
at least 95% of the relevant sequence, and a 
collection of disjoint Celera-unique scaffolds. 
The next step in developing the genome com- 
ponents was to determine the order and over- 
lap tiling of these BAC and Celera-unique 
scaffolds across the genome. For this, we 
used Celera's 50-kbp mate-pairs information, . 
and BAC-end pairs (18) and sequence tagged . 
site (STS) markers (44) to provide long- 
range guidance and chromosome separation. 
Given the relatively manageable number of 
scaffolds, we chose not to produce this tiling 
in a fully automated manner, but to compute 
an initial tiling with a good heuristic and then 
use human curators to resolve discrepancies 
or missed join opportunities. To this end, we 
developed a graphical user interface that dis- 
played the graph of tiling overlaps and the 
evidence for each. A human curator could 
then explore the implication of mapped STS 
data, dot-plots of sequence overlap, and a 
visual display of the mate-pair evidence sup- 
porting a given choice. The result of this 
process was a collection of "components," 
where each component was a tiled set of 
BAC and Celera-unique scaffolds that had 
been curator-approved. The process resulted 
in 3845 components with an estimated span 
of 2.922 Gbp. 

In order to generate the final CSA, we 
assembled each component with the WGA 
algorithm. As was done in the WGA process, 
the bactig data were shredded into a synthetic 
2X shotgun data set in order to give the 
assembler the freedom to independently as- 
semble the data. By using faux reads rather 
than bactigs, the assembly algorithm could 
correct errors in the assembly of bactigs and 
remove chimeric content in a PFP data entry. 



Chimerj^to- contaminating sequence (from 
another^B of the genome) would not be 
incorporated into the reassembly of the com- 
ponent because it did not belong there. In 
effect, the previous steps in the CSA process 
served only to bring together Celera frag- 
ments and PFP data relevant to a large con- 
tiguous segment of the genome, wherein we 
applied the assembler used for WGA to pro- 
duce an ab initio assembly of the region. 

WGA assembly of the components result- 
ed in a set of scaffolds totaling 2;906 Gbp in . 
span and consisting of 2.654 Gbp of se- 
quence. The chaff, or set of reads not incor- 
porated into the assembly, numbered 6.17 
million, or 22%. More than .90.0% of the 
genome was covered, by scaffolds spanning 
>100 kbp long, and these averaged 92.2% 
sequence and 7.8% gaps with a total of 2.492 
Gbp of sequence. There were a total of 
105,264 gaps among the 107,199 contigs that 
belong to the 1940 scaffolds spanning >100 
-kbp. The average scaffold size was 1.4 Mbp, 
the average contig size was 23.24 kbp, and 
the average gap size was 2.0 kbp where each 
distribution of sizes was exponential. As 
such, averages tend to be underrepresentative 
of the majority of the data. Figure 5 shows a 
histogram of the bases in scaffolds of various 
, size ranges. . Consider also that more than 
49% of all gaps were <500 bp long, more 
than 62% of all gaps were <1 kbp, and all 
gaps are < 100 kbp long. Similarly, more than 
73% of the sequence is in contigs > 30 kbp, 
more than .49% is in contigs > 100 kbp, and 
the largest contig was 1 .99 Mbp long. Table 3 
provides summary statistics for the structure . 
of this assembly with a direct comparison to 
the WGA assembly. 

2.5 Comparison of the WGA and CSA 
scaffolds 

Having obtained two assemblies of the hu- 
man genome via independent computational 
processes (WGA and CSA), we. compared 
scaffolds from the two assemblies as another 
means of investigating their completeness, 
consistency, and contiguity. From each as- 
sembly, a set of reference scaffolds contain- 
ing at least 1000 fragments (Celera sequenc- 
ing reads or bactig shreds) was obtained; this 
amounted to 2218 WGA scaffolds and 1717 
CSA scaffolds, for a total of 2.087 Gbp and 
2.474 Gbp. The sequence of each reference 
scaffold was compared to the sequence of all 
scaffolds from the other assembly with which 
it shared at least 20 fragments or at least 20% 
of the fragments of the smaller scaffold. For 
each such comparison, all matches of at least 
200 bp with at most 2% mismatch were 
tabulated 

From this tabulation, we estimated the 
amount of unique sequence in each assembly 
in two ways. The first was to determine the 
number of bases of each assembly that were 
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not covered by a matching segment in the 
other assembly. Some 82.5 Mbp of the WGA 
(3.95%) was not covered by the CSA, where- 
as 204.5 Mbp (8.26%) of the CSA was not 
covered by the WGA. This estimate did not 
require any consistency of the assemblies or 
any uniqueness of the matching segments. 
Thus, another analysis was conducted in 
which matches of less than 1 kbp between a 
pair of scaffolds were excluded unless they . 
were confirmed by other matches having a 
consistent order and orientation. This gives 
some measure of consistent coverage: 1.982 
Gbp (95.00%) of the WGA is covered by the 
CSA, and 2.169 Gbp (87.69%) of the CSA is 
covered by the WGA by this more stringent 
measure. , r 

The comparison of WGA to CSA also 
permitted evaluation of scaffolds for structur- 
al inconsistencies. We looked for instances in 
which a large section of a scaffold from one 
assembly matched only one scaffold from the 
other assembly, but failed to match over the 
full . length of the overlap implied by the 
matching segments. An initial set of candi- 
dates was identified automatically, and then 
each candidate was inspected by hand. From . 
this process, we identified 31 instances in 
which the assemblies appear to disagree in a 
nonlocal fashion. These cases are being fur- 
ther evaluated to determine which assembly « 
is in error and why. 

In addition, we evaluated local inconsis- 
tencies of order or orientation. The following 
results exclude cases in which one contig in 
one assembly corresponds to more than one 
overlapping contig in the other assembly (as 
long as the order and orientation of the latter 
agrees with the positions they match in the 
former). Most of these small rearrangements 
involved segments on the order of hundreds 
of base pairs and rarely >1 kbp. We found a 
total of 295 kbp (0.012%) in the CSA assem- 
blies that were locally inconsistent with the 
WGA assemblies, whereas 2.108 Mbp 
(0.11%) in the WGA assembly were incon- 
sistent with the CSA assembly. 
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The CSA assembly was a few percentage 
points better in terms of coverage and slightly 
more consistent than the WGA, because it 
was in effect performing a few thousand shot- 
gun assemblies of megabase-sized problems, 
whereas the WGA is performing a shotgun 

. assembly of a gigabase-sized problem. When 
one considers the increase of two-and-a-half 
orders of magnitude in problem size, the in- 

. , formation loss between the two is remarkably 
small. Because CSA was logistically easier to 
deliver and the better of the two results avail- 
able at the time when downstream analyses 
needed to be begun, all subsequent analysis 
was performed on this assembly. 

2.6 Mapping scaffolds to the genome 

. ■ The final step in assembling the genome was to 
order and orient the scaffolds on the chromo- 
somes. We first grouped scaffolds together on 
the basis of their order in the components from 
. CSA. These grouped scaffolds were reordered 
by examining residual mate-pairing data be- 
tween the scaffolds. We next mapped the scaf- ■ 
fold groups onto the chromosome using physi- 
. cal mapping data. This step depends on having 
reliable high-resolution map information such 
that each scaffold will overlap multiple mark- 
ers. There are two genome-wide types of map 
information available: high-density STS maps 
and fingerprint maps of BAC clones developed 
at Washington University (45), Among the ge-- 
nome-wide. STS maps, GeneMap99 (GM99) 
. has the most markers and therefore was most 
useful for mapping scaffolds. The two different 
mapping approaches are complementary to one 
another. The fingerprint maps should have bet- 
ter local order because they were built by com- 
parison of overlapping BAC clones. On the 
other hand, GM99 should have a more reliable - 
long-range order, because the framework mark- * 
ers were derived from well-validated genetic 
maps. Both types of maps were used as a 
reference for human curation of the compo- 
nents that were the input to the regional assem- 
bly, but they did not determine the order of 
sequences produced by the assembler. 



<30kb 30-50 kb 50-100 kb 100-500 kb 0.5-1 Mb 

Scaffold Size 
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Fig. 5. Distribution of scaffold sizes of the CSA For each range of scaffold sizes, the percent of total 
sequence is indicated. 



In order to determine the effectiveness of 
the fingerprint maps and GM99 for mapping 
scaffolds, we first examined the reliability of 
these maps by comparison with large scaf- 
folds. Only 1% of the STS markers on the 10 
largest scaffolds (those >9 Mbp) were 
mapped on a different .chromosome on 
GM99. Two percent of the STS markers dis- 
agreed in position by more than five frame- 
. work bins. However, ; for the fingerprint 
maps, a 2% chromosome . discrepancy was 
observed, and on average 23.8% of BAC 
locations in the scaffold sequence disagreed 
with fingerprint map placement by more than 
five BACs. When further examining the 
source of discrepancy, it was found that most 
of the discrepancy came from 4 of the 10 
scaffolds, indicating this there is variation in 
1 the quality of either the map or the scaffolds. 
All four scaffolds were assembled, as well as 
the other six, as judged by clone coverage 
analysis, and showed the same low discrep- 
ancy rate to GM99, and thus we. concluded 
that the fingerprint map global order in these 
cases was not reliable. Smaller scaffolds had 
a higher discordance rate with GM99 (4.21% 
of STSs were discordant by more than five 
framework bins), but a lower discordance rate 
with the fingerprint maps (11% of BACs 
disagreed with fingerprint maps by more than 
five BACs). This observation agrees with the 
ixlohe coverage analysis (46) that Celera scaf- 
fold construction was better supported by 
- long-range mate pairs in larger scaffolds than 
in small scaffolds. 

We created two orderings of Celera scaf- 
folds on the basis of the markers (BAC or 
STS) on these maps. Where the order of 
scaffolds agreed between GM99 and the 
WashU BAC map, we had a high degree of 
• confidence that that order was correct; these 
scaffolds were termed "anchor scaffolds." 
Only scaffolds with a low overall discrepancy 
rate with both maps were considered anchor 
scaffolds. Scaffolds in GM99 bins were al- 
lowed to permute in their order to match 
WashU ordering, provided they did not vio- 
late their framework orders. Orientation of 
individual scaffolds was determined by the 
presence of multiple mapped markers with 
consistent order. Scaffolds with only one 
marker have insufficient information to as- 
sign orientation. We found 70.1% of the ge- 
nome in anchored scaffolds, more than 99% : 
of which are also oriented (Table 4). Because 
GM99 is of lower resolution than the WashU 
map, a number of scaffolds without STS 
matches could be ordered relative to the an- 
chored scaffolds because they included se- 
quence from the same or adjacent BACs on 
the WashU map. On the other hand, because 
of occasional WashU global ordering dis- 
crepancies, a number of scaffolds detennined 
to be "unmappable" on the WashU map could 
be ordered relative to the anchored scaffolds 



1314 



16 FEBRUARY 2001 VOL 291 SCIENCE www.sciencemag.org 



with GM99. These scaffolds were termed 
"ordered scaffolds." We found that 13.9% of 
the assembly could be ordered by these ad- 
ditional methods, and thus 84.0% of the ge- 
nome was ordered unambiguously. 

Next, all scaffolds that could be placed, 
but not ordered, between anchors were as- 
signed to the interval between the anchored 
scaffolds and were deemed to be "bound- 
ed" between them. For example, small scaf- 
folds having STS hits from the same Gene- 
Map bin or hitting the same BAG cannot be 
ordered relative to each other, but can be 
assigned a placement boundary relative to 
other anchored or ordered scaffolds. The 
remaining scaffolds either had no localiza- 
tion information, conflicting information, 
or could only be assigned to a generic 
chromosome location. Using the above ap- 
proaches, ~98% of the genome was an- 
chored, ordered, or bounded. 

Finally, we assigned a location for each 
scaffold placed on the chromosome by 
spreading out the scaffolds per chromosome. 
We assumed that the remaining unmapped 
scaffolds, constituting 2% of the genome, 
were distributed evenly across the genome. 
By dividing the sum of unmapped scaffold 
lengths with the sum of the number of 
mapped scaffolds, we arrived at an estimate 
of interscaffold gap of 1483 bp. This gap was 
used to separate all the scaffolds on each 
chromosome and to assign an offset in the \ 
chromosome. 

During the. scaffold-mapping effort, we en- 
countered many problems that resulted in addi- 
tional quality assessment and validation analy- . 
sis. At least 978 (3% of 33,173) BACs were 
believed to have sequence data from more than 
one location in the genome (47). This is con- 
sistent with the bactig chimerism analysis re- 
ported above in the Assembly Strategies sec- 
tion. These BACs could not be assigned to 
unique positions within the CSA assembly and 
thus could not be used for ordering scaffolds. 
Likewise, it was not always possible to assign 
STSs to unique locations in the assembly be- 
cause of genome duplications, repetitive ele- 
ments, and pseudogenes. 

Because of the time required for an ex- 
haustive search for a perfect overlap, CSA 
generated 21,607 intrascaffold gaps where 
the mate-pair data suggested that the contigs 
should overlap, but no overlap was found. 
These gaps were defined as a fixed 50 bp in 
length and make up 18.6% of the total 
1 16,442 gaps in the CSA assembly. 

We chose not to use the order of exons 
implied in cDNA or EST data as a way of 
ordering scaffolds. The rationale for not us- 
ing this data was that doing so would have 
biased certain regions of the assembly by 
rearranging scaffolds to fit the transcript data 
and made validation of both the assembly and 
gene definition processes more difficult. 
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2.7^^mbly and validation analysis 

We analyzed the assembly of the genome 
from the perspectives of completeness 
(amount of coverage of the genome) and 
correctness (the structural accuracy of the 
order and orientation and the consensus se- 
quence of the assembly). 

Completeness. Completeness is defined as 
the percentage of the euchromatic sequence 
represented in the assembly. This cannot be . 
known with , absolute certainty until the eu- 
r chromatin , sequence has been completed. \ 
However, it is possible to estimate complete- 
ness on the basis of (i) the estimated sizes of 
intrascaffold gaps; (ii) coverage of the two 
published chromosomes, 21 and 22 (48, 49); 
and (iii) analysis of the percentage of an 
independent set of random sequences (STS 
markers) contained in the assembly. The . 
whole-genome libraries contain heterochro- 
matic sequence and, although no attempt has 
. been made to assemble it, there may be in- 
stances of unique sequence embedded in re- 
gions of heterochromatin as were observed in 
Drosophila (50, 51). 

The sequences of human chromosomes 2 1 
and 22 have been completed to high quality 
and published (48, 49). Although this se- 
quence served as input to the assembler, the 
finished sequence was shredded into a shot- 
gun data set so that the "assembler had the . 
opportunity to assemble it differently from 
the original sequence in the case of structural 
polymorphisms or assembly errors in the 
BAC data. In particular, the assembler must 
be able to resolve repetitive elements at the 
scale of components (generally multimega- 
base in size), and so this comparison reveals 
the level to which the assembler resolves 
repeats. In certain areas, the assembly struc- 
ture differs from the published versions of 
chromosomes 21 and 22 (see below). The 
consequence of the flexibility to assemble 
"finished" sequence differently on the basis 
of Celera data resulted in an assembly with 
more segments than the chromosome 2 1 and 
22 sequences. We examined the reasons why 
there are more gaps in the Celera sequence 
than in chromosomes 21 and 22 and expect 
that they may be typical of gaps in other 
regions of the genome. In the Celera assem- 
bly, there are 25 scaffolds, each containing at 
least 10 kb of sequence, that collectively span 
94.3% of chromosome 21. Sixty-two scaf- 
folds span 95.7% of chromosome 22. The 
total length of the gaps rerraining in the 
Celera assembly for these two chromosomes 
is 3.4 Mbp. These gap sequences were ana- 
lyzed by RepeatMasker and by searching 
against the entire genome assembly (52). 
About 50% of the gap sequence consisted of 
common repetitive elements identified by Re- 
peatMasker; more than half of the remainder 
was lower copy number repeat elements. 
A more global way of assessing complete- 



ness is to d^^re the content of an independent 
set of sequence data in the assembly. We com- 
pared 48,938 STS markers from Genemap99 
(51) to the scaffolds. Because these markers 
were not used in the assembly processes, they 
provided a truly independent measure of com- 

.. pleteness. ePCR (53) and BLAST (54) were 
used to locate STSs on the assembled genome. 
We found 44,524 (91%) of the STSs in the 
mapped genome. An additional 2648 markers 

. (5.4%) were found by searching the. unas- 
sembled data or "chaff."' We identified 1283 
STS markers (2.6%) not found in either Celera 
sequence or BAC data as of September 2000, 
raising the possibility that these markers may 
not be of human origin. If that were the case, 
the Celera assembled sequence would represent 
93.4% of the human genome and the unas- 
sembled data 5.5%, for a total of 98.9% cover- 
age. Similarly, we compared CSA against 
36,678 TNG radiation hybrid markers (55a) 
using the same method. We found that 32,371 
markers (88%) were located in the mapped 
CSA scaffolds, with 2055 markers (5.6%) 
found in the remainder. This gave a 94% cov- 
erage of the genome through another genome- 
wide survey. 

Correctness. Correctness is defined as the 
structural and sequence accuracy of the as- 
sembly. Because the source sequences for the 
Celera data and the GenBank data are from 

different individuals, we could not directly 

compare the consensus sequence of the as- 



Table 4. Summary of scaffold mapping. Scaffolds 
- were mapped to the genome with different levels 
of confidence (anchored scaffolds have the highest 
confidence; unmapped scaffolds have the lowest). 
Anchored scaffolds were consistently ordered by 
the WashU BAC map and CM99. Ordered scaf- 
folds were consistently ordered by at least one of 
the following: the WashU BAC map, GM99, or 
component tiling path. Bounded scaffolds had or- 
der conflicts between at least two of the external 
maps, but their, placements were adjacent to a 
neighboring anchored or ordered scaffold. Un- 
mapped scaffolds had, at most, a chromosome 
assignment The scaffold subcategories are given 
below each category. 



Mapped 






% 


scaffold 


Number 


Length (bp) 


Total 


category 






length 


Anchored 


1,526 


1,860,676,676 


70 


Oriented 


1,246 


1,852,088.645 


70 


Unoriented 


280 


8,588,031 


0.3 


Ordered 


2,001 


369,235,857 


14 


Oriented 


839 


329,633,166 


12 


Unoriented 


1,162 


39,602,691 


2 


Bounded 


38,241 


368,753.463 


14 


Oriented 


7,453 


274.536.424 


10 


Unoriented 


30,788 


94,217.039 


4 


Unmapped 


11,823 


55.313,737 


2 


Known 


281 


2,505,844 


0.1 


chromosome 








Unknown 


11,542 


52,807,893 


2 



chromosome 
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sembly against other finished sequence for 
determining sequencing accuracy at the nu- 
cleotide level, although this has been done for 
identifying polymorphisms as described in 
Section 6. The accuracy of the consensus 
sequence is at least 99.96% on the basis of a 
statistical estimate derived from the quality 
values of the underlying reads. 

The structural cons istency of the assembly ; 
can be measured by mate-pair analysis. In a , 
correct assembly, every mated pair of se- . 
quencing reads should be located on the con- 
; sensus sequence with the correct separation 
and orientation between the pairs. A pair is 
termed 'Valid" when the reads are in the 
correct orientation and the distance between 
them is within the mean ± 3 standard devi- •■ 
ations of the distribution of insert sizes of the 
library from which the pair was sampled. A 
pair is termed "misoriented" when the reads 
are not correctly oriented, and is termed "mis- 
separated" when the distance between the 
reads is not in the correct range but the reads 
are correctly oriented. The mean ± the stan- 
dard deviation of each library used by the 
assembler was determined as .described 
above. To validate these, we examined all 
reads mapped to the finished sequence of 
chromosome 21 (48) and determined how 
many incorrect mate pairs there were as a 
result of laboratory tracking errors and chi- . 
merism (two different segments of the ge- s . 
nome cloned into the same plasmid), and how 
tight the distribution of insert sizes was for 
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those that were correct (Table 5). The stan- 
dard deviations for all Celera libraries were 
quite small, less than 15% of the insert 
length, with the exception of a few 50-kbp 
libraries. The 2- and 10-kbp libraries con- 
tained less than 2% invalid mate pairs, where- 
as the 50-kbp libraries were somewhat higher 
(-10%). Thus, although the mate-pair infor- 
, mation was not perfect, its accuracy was such 
\. that measuring valid, misoriented, and mis- . 

separated pairs with respect to a given assem- 
- bly was deemed to be a reliable instrument 
for validation purposes, especially when sev- 
eral mate pairs confirm or deny an ordering. 

The clone coverage of the genome was 
39 X, meaning that any given base pair was, 
on average, contained in 39 clones or, equiv- 
alently, spanned, by 39 mate-paired reads. 
Areas of low clone coverage or areas with a 
high proportion of invalid mate pairs would 
indicate potential assembly problems. We 
computed the coverage of each base in the 
assembly by valid mate pairs (Table 6). In 
summary, for scaffolds >30 kbp in length, 
less than 1% of the Celera assembly was in 
regions of less than 3 X clone coverage. Thus, 
more than 99%. of the assembly, including 
order and orientation, is strongly supported 
by this measure alone. 

We examined the locations and number of 
all misoriented and misseparated mates. In rs 
.addition to doing this analysis on the CSA 
assembly, (as . of 1 October 2000), we also . , 
performed a study of the PFP assembly as of 



5 September 2000 (30, 55b), In this latter 
case, Celera mate pairs had to be mapped to 
the PFP assembly. To avoid mapping errors 
. due to high-fidelity repeats, the only pairs 
mapped were those for which both reads 
matched at only one location with less than 
6% differences. A threshold was set such that 
sets of five or more simultaneously invalid 
mate pairs indicated a potential breakpoint, 
• where the construction of the two assemblies 
1 differed. The graphic comparison of the CSA 
chromosome 21 assembly with the published 
sequence (Fig. 6A) serves as a validation of 
this methodology. Blue tick marks in the 
panels indicate breakpoints. There were a 
similar (small) number of breakpoints on 
both chromosome sequences. The exception 
was 12 sets of scaffolds in the Celera assem- 
bly (a total of 3% of the chromosome length 
in 212 single-contig scaffolds) that were 
mapped to the wrong positions because they 
were too small to be mapped reliably. Figures 
6 and 7 and Table . 6 illustrate , the mate-pair 
differences and breakpoints between the two 
assemblies. There was a higher percentage of 
misoriented and misseparated mate pairs in 
the large-insert libraries (50 kbp and BAC 
ends) than in the small-insert libraries in both 
assemblies (Table 6). The large-insert librar- 
ies are more likely to identify discrepancies 
simply because they span a larger segment of 
the genome; The graphic comparison be- . 
tween the two assemblies for chromosome 8 
(Fig. 6, B and C) shows that there are many 



Table 5. Mate-pair validation. Celera fragment sequences were mapped to 
the published sequence of chromosome 21. Each mate pair uniquely 
mapped was evaluated for correct orientation and placement (number 



of mate pairs tested). If the two mates had incorrect relative orienta- 
tion or placement, they were considered invalid (number of invalid mate 
pairs). 



Chromosome 21 



Library 
type 



Library 
no. 



Mean 
insert 
size 
(bp) 


SD 
(bp) 


SD/ 
mean 
(%) 


No. of 
mate 
pairs 

tested 


2,081 


106 


5.1 


3,642 


1,913 


152 


7!9 


28.029 


2,166 


175 


8.1 


4,405 


11,385 


851 


7.5 


4.319 


14,523 


1,875 


12.9 


7,355 


9,635 


1,035 


10.7 


5,573 


10,223 


928 


9.1 


34,079 


64,888 


2,747 


4.2 


16 


53,410 


5,834 


10.9 


914 


52,034 


7,312 


14.1 


5,871 


52,282 


7,454 


14.3 


2,629 


46,616 


7,378 


15.8 


2.153 


55,788 


10,099 


18.1 


2.244 


39,894 


5,019 


12.6 


199 


48,931 


9,813 


20.1 


144 


48,130 


4,232 


8.8 


195 


106,027 


27,778 


26.2 


330 


160.575 


54,973 


34.2 


155 


164,155 


19,453 


11.9 


642 








102.894 



No. of 
invalid 
mate 
pairs 



% 
invalid 



Genome 


Mean 




SD/ 


insert 


SD 


mean 


size (bp) 


(bp) 


(%) 


2,082 


90 


4.3 


1.923 


118 


6.1 


2,162 


158 


7.3 


11,370 


696 


6.1 


14,142 


1,402 


9.9 


9,606 


934 


9.7 


10,190 


777 


7.6 


65,500 


5,504 


8.4 


53,311 


5,546 


10.4 


51,498 


6,588 


12.8 


52,282 


7.454 


14.3 


45,418 


9.068 


20.0 


53,062 


10,893 


20.5 


36,838 


9,988 


27.1 


47,845 


4,774 


10.0 


47,924 


4,581 


9.6 


152,000 


26,600 


17.5 . 


161.750 


27.000 


16.7 


176.500 


19.500 


11.05 
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38 

413 
57 
80 
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109 

399 
1 

170 
569 
, 213 
215 
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7 
10 
14 
16 
8 
44 
2,768 
(mean = 2.7) 



1.0 
1.5 
1.3 
1.9 
2.1 
2.0 
12 
6.3 
18.6 
9.7 
8.1 
10.0 
11.1 
3.5 
6.9 
72 
4.8 
5.2 
6.9 
2.7 



1316 



16 FEBRUARY 2001 VOL 291 SCIENCE vww.sciencemag.org 




s for the Celer. ^'"l^te) for both 

I assemblies soi eac ori entation of 

li side fashion. The order an fcwer 
|i Celera's assembly shows subs m ^ ^ 
; breakpoints ^xcept o £etwo . 
I mosomes. Figure 7 aisc > o y . fi k 
I L, 0 kbp ) in both assemblies as Ka .. 
I (>10 WP)" 1 assembly, the size of all 
marks^ In «^»« J' basis of ^ 

I assemblies were ^ ^ 

SnSS^genomeassemblies. 

3 Gene Prediction and Annotation 

we developed used to 

I increase the lucemiu between the 

Otto-predicted g e ^ m ™L 'rediction 
initial computational approach. 



31 Automated gene annotation 

! . s, a locus of cotranscribed exons. A 
A gene is a locus oi ^ 

S ^ ^dtusStiple distinct proteins 



THE HUMAN GENOME 



tiation and ternunatton »tes. " 
able to discern within ^ *Uu» ° als for 
pairs of the genomic DNA f *e signa 
Vitiating «^£&SSU 

^The number of protein-coding genes to 

the corporate and pubhc sectors ^ 

transcript density u hichest recent 

^rit te quite different, and 
In stark contrast are ™«* h 
mU ch lower estunates: one of 35,ou g 

derived with ■f^^^Ste. with 
sampling F«g*£ SE of 28,000 
chromosome 22 data com parative 
to 34,000 genes dewed with 

methodology involving se ^ en * fisnTe . 
non between huma^ and the^ffer ^ ^ 

ft,, are likely to J ?™Sb"»2kr«.of 
genes. ™ * ^TS^ 3iS»m » » 

also cnticai xo uti ;„ vpnt orv The sec- 



witnmuuipi* , - 

Table 6. Genome-w.de matepa^^ — 



Genome 
library 



2W>p 
10kbp 
50kbp 
BES 



; • c «, ESTs The following section de- 
' Jthe methods we have developed to 
Sitsfproblems for the prediction of 

tem c^Ted Otto, to identify and characterize 

evidence provided by the computanonaT pipe 

StoForexmele,.^™" 
ante homology to »J ^f^. 

evaluate »h«»« ™ "".Sa The curator 
into a longer, v«t»l rnRNA. »• 

„o»ld to equate the ^S^^ 1 ™ 

larity and the contigntty of tta 

See asking »he*e> 

Initially, gene bounce are P« 

me basis of —^Xfgen^tedbya 
ping protem and EST matcnes g 

'SSLSt Bckof th**"*-" sequences 
^ EST ' *e roatcSta^lS »f related 



131 



gene boundaries; During this process, multiple 
hits to the same region were collapsed to a 
coherent set of data by tracking the coverage of 
a region. For example, if a group of bases was 
represented by multiple overlapping ESTs, the 
union of these regions matched by the set of 
ESTs on the scaffold was marked as being 
supported by EST evidence. This resulted in a 
series of "gene bins " each of which was be- 
lieved to contain a single gene. One weakness of 
this initial implementation of the algorithm was 
in predicting gene boundaries in regions of tan- 
demly duplicated genes. Gene clusters frequent- 
ly resulted in homologous neighboring genes 
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being joined together, resulting in an annotation 
that artificially concatenated these gene models. 
Next, known genes (those with exact match- 
' es of a full-length cDNA sequence to the ge- 
nome) were identified, and the region corre- 
sponding to the cDNA was annotated as a 
predicted transcript. A subset of the curat- 
ed human gene set RefSeq from the Nation- 
al Center for . Biotechnology Information 
(NCBI) was included as a data set searched in 
the computational pipeline. If a RefSeq tran- 
script matched the genome assembly for at least 
50% of its length at >92% identity, then the 
SIM4 (63) alignment of the RefSeq transcript to 



the region of the genome under analysis was 
promoted to the status of an Otto annotation. 
Because the genome sequence has gaps and 
sequence errors such as frameshifts, it was not 
always possible to predict a transcript that 
agrees precisely with the experimentally deter- 
mined cDNA sequence. A total of 6538 genes 
in our inventory were identified and transcripts 
predicted in this way. 

Regions that have a substantial amount of 
sequence similarity, but do not match known 
genes, were analyzed by that part of the Otto 
system that uses the sequence similarity in- 
formation to predict a transcript. Here, Otto 
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Fig. 6. Comparison of the CSA and the PFP assembly. 
(A) All of chromosome 21, (B) all of chromosome 8, 
and (C) a 1-Mb region of chromosome 8 representing 
a single Celera scaffold. To generate the figure, Celera 
fragment sequences were mapped onto each assem- 
bly. The PFP assembly is indicated in the upper third 
of each panel; the Celera assembly is indicated in the 
lower third. In the center of the panel, green lines 
show Celera sequences that are in the same order and 
orientation in both assemblies and form the longest 
consistently ordered run of sequences. Yellow lines 
indicate sequence blocks that are in the same orien- 
tation, but out of order. Red lines indicate sequence 
blocks that are not in the same orientation. For 
clarity, in the latter two cases, lines are only drawn 
between segments of matching sequence that are at 
least 50 kbp long. The top and bottom thirds of each 
panel show the extent of Celera mate-pair violations 
{red, misoriented; yellow, incorrect distance between 
the mates) for each assembly grouped by library size. 
(Mate pairs that are within the correct distance, as 
expected from the mean library insert size, are omit- 
ted from the figure for clarity.) Predicted breakpoints, 
corresponding to stacks of violated mate pairs of the 
same type, are shown as blue ticks on each assembly 
axis. Runs of more than 10,000 Ns are shown as cyan 
bars. Plots of all 24 chromosomes can be seen in Web 
fig. 3 on Science Online at www.sciencemae.ore/cei/ 
content/full/291/5507/1304/DC1. 
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evaluates evidence generated by the compu- 
tational pipeline, corresponding to conserva- 
tion between mouse and human genomic 
DNA, similarity to human transcripts (ESTs 
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and cDNAs), similarity to rodent transcripts 
(ESTs and cDNAs), and similarity of the 
translation of human genomic DNA to known 
proteins to predict potential genes in the hu- 



man genome. The sequence from the region 
of genomic DNA contained in a gene bin was 
extracted, and the subsequences supported by 
any homology evidence were marked (plus 100 
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bases flanking these regions). The other bases 
in the region, those not covered by any homol- 
ogy evidence, were replaced by N's. This se- 
quence segment, with high confidence regions 
represented by the consensus genomic se- 
quence and the remainder represented by N*s, 
was then evaluated by Genscan to see if a 
consistent gene model could be generated This 
procedure simplified the gene-prediction task 
by first establishing the boundary for the gene 
(not a strength of most gene-finding algo- 
rithms), and by eliminating regions with no 
supporting evidence. If Genscan returned a 
plausible gene model, it was further evaluated 
before being promoted to an "Otto" annotation. 
The final Genscan predictions were often quite 
different from the prediction that Genscan re- 
turned on the same region of native genomic 
sequence. A weakness of using Genscan to 
refine the gene model is the loss of valid, small 
exons from the final annotation. 

The next step in defining gene structures 
based on sequence similarity was to compare 
each predicted transcript with the homology- 
based evidence that was used in previous steps 
to evaluate the depth of evidence for each exon 
in the prediction. Internal exons were consid- 
ered to be supported if they were covered by 
homology evidence to within ±10 bases of 
their edges. For first and last exons, the internal 
edge was required to be within 1 0 bases, but the 
external edge was allowed greater latitude to 
allow for 5' and 3' untranslated regions , 
(UTRs). To be retained, a prediction for a 
multi-exon gene must have evidence such that 
the total number of "hits," as defined above, 
divided by the number of exons in the predic- 
tion must be >0.66 or must correspond to a 
RefSeq sequence. A single-exon gene must be 
covered by at least three supporting hits (±10 
bases on each side), and these must cover the 
complete predicted open reading frame. For - 
a single-exon gene, we also required that 
the Genscan prediction include both a start 
and a stop codon. Gene models that did not 
meet these criteria were disregarded, and 

Table 7. Sensitivity and specificity of Otto and 
Genscan. Sensitivity and specificity were calculat- 
ed by first aligning the prediction to the published 
RefSeq transcript, tallying the number (/V) of 
uniquely aligned RefSeq bases. Sensitivity is the 
ratio of N to the length of the published RefSeq 
transcript Specificity is the ratio of N to the 
length of the prediction. All differences are signif- 
icant (Tukey HSD; P < 0.001). 



Method 


Sensitivity 


Specificity 


Otto (RefSeq only)* 


0.939 


0.973 


Otto (homology)t 


0.604 


0.884 


Genscan 


0.501 


0.633 



• Refers to those annotations produced by Otto using only 
the Sim4-polished RefSeq alignment rather than an evi- 
dence-based Genscan prediction, fRefers to those 
annotations produced by supplying all available evidence 
to Genscan. 



those that passed were promoted to Otto 
predictions. Homology-based Otto predic- 
tions do not contain 3' and 5' untranslated 
sequence. Although three de novo gene-finding 
programs [GRAIL, Genscan, and FgenesH 
(63)] were run as part of the computational 
analysis, the results of these programs were not 
directly used in making the Otto predictions. 
.Otto predicted 11,226 additional genes by 
. means of sequence similarity. 

3.2 Otto validation 

To validate the Otto homology-based process 
and the method that Otto uses to define the 
structures of known genes, we compared tran- 
scripts predicted by Otto with their correspond- 
ing (and presumably correct) transcript from a 
set of 4512 RefSeq transcripts for which there 
. was a unique SIM4 alignment (Table 7). In 
order to evaluate the relative performance of 
. Otto and Genscan, we made three comparisons. 
The first involved a determination of the accu- 
racy of gene models predicted by Otto with 
only homology data other than the correspond- 
ing RefSeq sequence (Otto homology in Table 
7). We measured the sensitivity (correctly pre- 
. dieted bases divided by the total length of the 
-cDNA) and specificity (correctly predicted 
bases divided by the sum of the correctly and 
incorrectly predicted bases). Second, we exam- 
, ined the sensitivity arid specificity of the Otto 
predictions that were made solely with the Ref- 
Seq sequence, which is the process that Otto, 
uses to annotate known genes (Otto-RefSeq). 
And third, we determined the accuracy of the 
Genscan predictions corresponding to these 
RefSeq sequences. As expected, the alignment 
method (Otto-RefSeq) was the most accurate, 
and Otto-homology performed better than Gen- 
scan by both criteria. Thus, 6. 1 % of true RefSeq 
nucleotides were not represented in the Otto- 
refseq annotations and 2.7% of the nucleotides 
in the Otto-RefSeq transcripts were not con- 
tained in the original RefSeq transcripts. The 
discrepancies could come from legitimate 
differences between the Celera assembly 
and the RefSeq transcript due to polymor- 
phisms, incomplete or incorrect data in the 
Celera assembly, errors introduced by Sim4 
during the alignment process, or the pres- 
ence of alternatively spliced forms in the 
data set used for the comparisons. 

Because Otto uses an evidence-based ap- 
proach to reconstruct genes, the absence of 
experimental evidence for intervening exons. 
may inadvertantly result in a set of exons that 
cannot be spliced together to give rise to a 
transcript In such cases, Otto may "split genes" 
when in fact all the evidence should be com- 
bined into a single transcript We also examined 
the tendency of these methods to incorrectly 
split gene predictions. These trends are shown 
in Fig. 8. Both RefSeq and homology-based 
predictions by Otto split known genes into few- 
er segments than Genscan alone. 



3.3 Gene number 

Recognizing that the Otto system is quite 
conservative, we used a different gene-pre- 
diction strategy in regions where the ho- 
mology evidence was less strong. Here the 
results of de novo gene predictions were 
used. For these genes, we insisted that a 
predicted transcript have at least two of the 
following types of evidence to be included 
in the gene set for further analysis: protein, 
human EST, rodent EST, or mouse genome 
fragment matches. This final class of pre- 
dicted genes is a subset of the predictions 
made by the three gene-finding programs 
that were used in the computational pipe- 
line. For these, there was not sufficient 
sequence similarity information for Otto to 
attempt to predict a gene structure. The 
. three de novo gene-finding programs re- 
sulted in about 155,695 predictions, of 
which ~76,410 were nonredundant (non- 
overlapping with one another). Of these, 
57,935 did not overlap known genes or 
predictions made by Otto. Only 21,350 of 
the gene predictions that did not overlap 
Otto predictions were partially * supported 
by at least one type of sequence similarity 
evidence, and 8619 were partially support- 
ed by two types of evidence (Table 8). 

The sum of this number (21,350) and the 
number of Otto annotations (17,764), 39,1 14, 
is near the upper limit for the human gene 
complement. As seen in Table 8, if the re- 
quirement for other supporting evidence is 
made more stringent, this number drops rap- 
idly so that demanding two types of evidence 
reduces the total gene number to 26,383 and 
demanding three types reduces it to ~23,000. 
Requiring that a prediction be supported by 
all four categories of evidence is too stringent 
.. because it would eliminate genes that encode 
novel proteins (members of currently unde- 
scribed protein families). No correction for 
pseudogenes has been made at this point in 
the analysis. 

In a further attempt to identify genes that 
were not found by the autoannotation process 
or any of the de novo gene finders, we ex- 
amined regions outside of gene predictions 
that were similar to the EST sequence, and 
where the EST matched the genomic se- 
quence across a splice junction. After correct- 
ing for potential 3' UTRs of predicted genes, 
about 2500 such regions remained. Addition 
of a requirement for at least one of the fol- 
io wing evidence, types— homology to mouse 
genomic sequence fragments, rodent ESTs, 
or cDNAs — or similarity to a known protein 
reduced this number to 1010. Adding this to 
the numbers from the previous paragraph 
would give us estimates of about 40,000, 
27,000, and 24,000 potential genes in the 
human genome, depending on the stringency 
of evidence considered. Table 8 illustrates the 
number of genes and presents the degree of 
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confidence based on the supporting evidence. 
Transcripts encoded by a set of 26,383 genes 
were assembled for further analysis. This set 
includes the 6538 genes predicted by Otto on 
the basis of matches to known genes, 1 1,226 
transcripts predicted by Otto based on homol- 
ogy evidence, and 8619 from the subset of 
transcripts from de novo gene-prediction pro- 
grams that have two types of supporting ev- 
idence. The 26,383 genes are illustrated along 
chromosome diagrams in Fig. 1. These are a ; 
very preliminary set of annotations and are 
subject to all the limitations of an automated, 
process. Considerable refinement is still nec- 
essary to improve the accuracy of these tran- 
script predictions. All the predictions and 
descriptions of genes and the associated evi- 
dence that we present are the product of 
completely computational processes, not ex- 
pert curation. We have attempted to enumer- 
ate the genes in the human genome in such a 
way that we have different levels of confi- 
dence based on the amount of supporting 
evidence: known genes, genes with good pro- 
tein or EST homology evidence, and de novo 
gene predictions confirmed by modest ho- 
mology evidence. 

3.4 Features of human gene 
transcripts 

We estimate the average span for a "typi- 
cal" gene in the human DNA sequence to 
be about 27,894 bases. This is based on the 
average span covered by RefSeq tran- 
scripts, used because it represents our high- 
est confidence set. 

The set of transcripts promoted to gene 
annotations varies in a number of ways. As 
can be seen from Table 8 and Fig. 9, tran- 
scripts predicted by Otto tend to be longer, 
having on average about 7.8 exons, whereas 
those promoted from gene-prediction pro- 
grams average about 3.7 exons. The largest 
number of exons that we have identified in a 
transcript is 234 in the titin mRNA. Table 8 
compares the amounts of evidence that sup- 
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port the Otto and other predicted transcripts. 
For example, one can see that a typical Otto 
transcript has 6.99 of its 7.81 exons supported 
by protein homology evidence. As would be 
. expected, the Otto transcripts generally have 
more support than do transcripts predicted by 
the de novo methods. 

4 Genome Structure 

Summary. This section describes several of 
. the: noncoding attributes, of the assembled 
genome sequence and their correlations with 
the predicted gene set. These include an anal- 
ysis of G+C content and gene density in the 
context of cytogenetic maps of the genome, 
an enumerative analysis of CpG islands, and 
a brief description of the genome-wide repet- 
. itive elements. 



4.feytogenetic maps 

Perhaps the most obvious, and certainly the 
. most visible, element of the structure of 
the genome is the banding pattern produced 
; by Giemsa stain. Chromosomal banding 
studies have revealed that about 17% to 
20% of the human chromosome comple- 
ment consists of C-bands, or constitutive 
. heterochromatin (64). Much of this hetero- 
: chromatin is highly polymorphic and con- 
sists of different families of alpha satellite 
- DNAs with various higher , order repeat 
structures (65). Many chromosomes have 
complex inter- and intrachromosomal du- 
plications present in pericentromeric re- 
gions (66). About 5% of the sequence reads 
were identified as alpha satellite sequences; 
these were not included in the assembly. 
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Fig. 8. Analysis of split genes resulting from different annotation methods. A set of 4512 
Sim4-based alignments of RefSeq transcripts to the genomic assembly were chosen (see the text 
for criteria), and the numbers of overlapping Genscan, Otto (RefSeq only) annotations based solely 
on Sim4-polished RefSeq alignments, and Otto (homology) annotations (annotations produced by 
supplying all available evidence to Genscan) were tallied. These data show the degree to which 
multiple Genscan predictions and/or Otto annotations were associated with a single RefSeq 
transcript The zero class for the Otto-homology predictions shown here indicates that the 
Otto-homology calls were made without recourse to the RefSeq transcript, and thus no Otto call 
was made because of insufficient evidence. 
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Otto 



De novo 



No. of exons per 
transcript 





Total 




. Types of evidence 






No. of lines of evidence* 






Mouse 


Rodent 


Protein 


Human 


£=1 


5=2 


2=3 


£=4 


Number of 
transcripts 

Number of 
exons 


17,969 
141,218 


17,065 
111,174 


14,881 
89,569 


15,477 
108,431 


16,374 
118,869 


17,968t 

140,710 


17,501 
127,955 


15,877 
99,574 


12,451 
59,804 


Number of 
transcripts . 

Number of 
exons 


58,032 
319,935 


14,463 
48,594 


5,094 
19,344 


8,043 
26,264 


9,220 
40,104 


21350 
79,148 


8,619 

31,130 


4,947 
17,508 


1,904 
6,520 


Otto 
De novo 


7.84 
5.53 


5.77 
3.17 


6.01 
3.80 


6.99 
3.27 


7.24 
4.36 


7.81 
3.7 


7.19 
3.56 


6.00 
3.42 


4.28 
3.16 



Z^l^^ 0 !^!"" (^^ervation in 3X mouse genomic DNA, similarity to human EST or cONA. similarity to rodent EST or cDNA. and similarity to known oroteins) were 
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Examination of pericentromeric regions is 
ongoing. 

The remaining -80% of the genome, the 
euchromatic component, is divisible into G-, 
R-, and T-bands (67), These cytogenetic bands 
have been presumed to differ in their nucleotide 
composition and gene density, although we 
. have been unable to determine precise band 
boundaries at the molecular level. T-bands are 
the most G+C- and gene-rich, and G-bands are 
G+C-poor (68), Bernardi has also offered a 
description of the euchromatin at the molecular 
level as long stretches of DNA of differing base 
composition, termed isochores (denoted L, HI, 
H2, and H3), which are >300 kbp in length 
(69). Bernardi defined the L (light) isochores as 
G+C-poor (<43%), whereas the H (heavy) 
isochores fall into three G+C-rich classes rep- 
resenting 24, 8, and 5% of the genome. Gene 
concentration has been claimed to be very low 
in the L isochores and 20-fold more enriched in 
the H2 and H3 isochores (70). By examining 
contiguous 50-kbp windows of G+C content 
across the assembly, we found that regions of 
G+C content >48% (H3 isochores) averaged 
273.9 kbp in length, those with G+C content 
between 43 and 48% (HI +H2 isochores) aver- 
aged 202.8 kbp in length, and the average span 
of regions with <43% (L . isochores) was 
1078.6 kbp. The correlation between G+C 
content and gene density was also examined in 
50-kbp windows along the assembled sequence 
(Table 9 and Figs. 10 and 11). We found that; 
the density of genes was greater in regions of 
high G+C than in regions of low G+C content, 
as expected. However, the correlation between 
G+C content and gene density was not as 
skewed as previously predicted (69). A higher 
proportion of genes were located in the G+C- 
poor regions than had been expected 

Chromosomes 17, 19, and 22, which have 
a disproportionate number of H3-containing 
bands, had the highest gene density (Table 
10). Conversely, of the chromosomes that we 



Fig. 9. Comparison of 
the number of exons 
per transcript between 
the 17,968 Otto tran- 
scripts and 21350 de 
novo transcript predic- 
tions with at least one 
line of evidence that 
do not overlap with an 
Otto prediction. Both 
sets have the highest 
number of transcripts 
in the two-exon cate- 
gory, but the de novo 
gene predictions are 
skewed much more 
toward smaller tran- 
scripts. In the Otto set 
19.7% of the tran- 
scripts have one or 
two exons, and 5.7% 
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found to have the lowest gene density, X, 4, 
1 8, 13, and Y, also have the fewest H3 bands. 
Chromosome 15, which also has few H3 
bands, did not have a particularly low gene 
density, in our analysis. In addition, chromo- 
some 8, which we found to have a low gene 
density, does not appear to be unusual in its 
H3 banding. 

How valid is .Ohno's postulate (71) that 
^mammalian genomes consist of oases of genes 
in otherwise essentially empty .deserts? It ap- 
pears that the human genome does indeed con- 
tain deserts, or large, gene-poor regions. If we 
. define a desert as a region >500 kbp without a 
gene, then we see that 605 Mbp, or about 20% 
. of the . genome, is in deserts. These are not. 
t .uniformly distributed over the various chromo- 
somes. Gene-rich chromosomes 17, 19, and 22 
have only about 12% of their collective 171 
Mbp in deserts, whereas gene-poor chromo- 
somes 4, 13, 18, and X have 27.5% of their 492 
Mbp in deserts (Table 1 1). The apparent lack of 
predicted genes in these regions does not nec- 
essarily imply that they are devoid of biological 
function. 

4.2 Linkage map 

Linkage maps provide the basis for genetic . 
analysis and are widely used in the study of the 
inheritance of traits and in the positional clon- 
ing of genes. The distance metric, centimorgans 
(cM), is based on the recombination rate be- 
tween homologous chromosomes during meio- 

Table 9. Characteristics of C+C in isochores. 



sis. In general, the rate of recombination in 
females is greater than that in males, and this 
degree of map expansion is not uniform across 
the genome (72). One of the opportunities en- 
abled by a nearly complete genome sequence is 
.to. produce the ultimate physical map, and to 
. fully analyze its correspondence with two other 
.maps that have been widely used in genome 
,. and. genetic analysis: the linkage map and the 
V : cytogenetic map. This would . close the loop 
- between the mapping and sequencing phases of 
the genome project 

We mapped the location of the markers 
that constitute the Genethon linkage map to 
the genome. The rate of recombination, ex- 
pressed as cM per Mbp, was calculated for 
3 -Mbp windows as shown in Table 12. High- 
er rates of recombination in the telomeric 
. region of the chromosomes have been previ- 
ously documented (73). From this mapping 
result, there is a difference of 4.99 between 
lowest rates and highest rates and the largest 
difference of 4.4 between males and females 
(4.99 to 0.47 on chromosome 16). This indi- 
cates that the variability in recombination 
rates among regions of the genome exceeds 
the differences in recombination rates be- 
tween males and females.. The human ge- 
nome has recombination hotspots, where re- 
combination rates vary fivefold or more over 
a space of 1 kbp, so the picture one gets of the 
magnitude of . variability in recombination 
rate will depend on the size of the window 



Isochore 


C+C (%) 


Fraction of genome 


Fraction of genes 


Predicted* 


Observed 


Predicted* 


Observed 


H3 

H1/H2 
L 


>48 
43-48 
<43 


5 
25 
67 


9.5 
21.2 
69.2 


37 
32 
31 


24.8 
26.6 
48.5 


•The predictions were based on Bemardi's definitions (70) of the isochore structure of the human 


genome. 
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Number of exons per transcript 

have more than 20. In the de novo set 49.3% of the transcripts have one or two exons, and 0.2% have more than 20. 
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examined. Unfortunately, too few meiotic 
crossovers have occurred in Centre d'Etude 
du Polymorphism Humain (CEPH) and other 
reference families to provide a resolution any 
finer than about 3 Mbp. The next challenge 
will be to determine a sequence basis of 
recombination at the chromosomal level. An 
accurate predictor for the rate for variation in 
recombination rates between any pair of 
markers would be extremely useful in design- 
ing markers to narrow a . region of linkage, 
such as in positional cloning projects. 

4.3 Correlation between CpC islands 
and genes 

CpG islands are stretches of unmethylated 
DNA with a higher frequency of CpG 
dinucleotides when compared with the entire 
genome (74). CpG islands are believed to 
preferentially occur at the transcriptional start 
of genes, and it has been observed that most 
housekeeping genes have CpG islands at the 
5' end of the transcript (75,76). In addition, 
experimental evidence indicates that CpG is- 
land methylation is correlated with gene in- 
activation (77) and has been shown to be 
important during gene imprinting (78) and 
tissue-specific gene expression (79) 

Experimental methods have been used 
that resulted in an estimate of 30,000 to 
45,000 CpG islands in the human genome 
(74, 80) and an estimate of 499 CpG islands 
on human chromosome 22 (81), Larsen et 
al (76) and Gardiner-Garden and Frommer 
(75) used a computational method to iden- 
tify CpG islands and defined them as re- 
gions of DNA of >200 bp that have a G+C 
content of >50% and a ratio of observed 
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ersus expected frequency of CG dinucle- 
otide >0.6. 

It is difficult to make a direct compari- 
son of experimental definitions of CpG is- 
lands with computational definitions be- 
cause computational methods do not con- 
sider the methylation state of cytosine and 
experimental methods do not directly select 
regions of high G+C content. However, we 
can determine the correlation of CpG island 
with gene starts, given a set of annotated 
, genomic transcripts arid the whole genome 
sequence. -We ,have analyzed the publicly 
; available annotation of chromosome 22, as 
well as using the entire human genome in 
our assembly and the computationally an- 
notated genes. A variation of the CpG is- 
land computation was compared with, 
Larsen et al. (76). The main differences are 
that we use a sliding window of 200 bp, 
consecutive windows are merged only if 
they overlap, and we recompute the CpG 
value upon merging, thus rejecting any po- 
tential island if it scores less than the 
threshold. 

To compute various CpG statistics, we 
used two different thresholds of CG dinucle- 
otide likelihood ratio. Besides using the orig- 
inal threshold of 0.6 (method 1), we used a 
higher , threshold of CG dinucleotide likeli- 
hood ratio of 0.8 (method 2), which results in 
the number of CpG islands on chromosome 
22 close to the number of annotated genes on 
this chromosome. The main results are sum- 
marized in Table 13. CpG islands computed . 
with method 1 predicted only '2.6% of the 
CSA sequence as CpG, but 40% of the gene 
starts (start codons) are contained inside a 



CpG island. This is comparable to ratios re- 
ported by others (82). The last two rows of 
the table show the observed and expected 
average distance, respectively, of the closest 
CpG island from the first exon. The observed 
average closest CpG islands are smaller than 
the corresponding expected distances, con- 
firming an association between CpG island 
and the first exon. 

We also looked at the distribution of CpG 
island nucleotides among various sequence 
, classes such, as intergenic regions, introns, . 
exons, and first exons. We computed the 
likelihood score for each sequence class as 
the ratio of the observed fraction of CpG 
island nucleotides in that sequence class 
and the expected fraction of CpG island 
nucleotides in that sequence class. The re- 
sult of applying method 1 on CSA were 
scores of 0.89 for intergenic region, 1.2 for 
intron, 5.86 for exon, and 13.2 for first 
exon. The same trend was also found for 
chromosome 22 and after the application of 
a higher threshold (method 2) on both data 
sets. In sum, genome-wide analysis has 
extended earlier analysis and suggests a 
strong correlation between CpG islands and 
first coding exons. 
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4.4 Genome-wide repetitive elements 

The proportion of the genome covered by 
various classes of repetitive DNA is present- 
ed in Table 14. We observed about 35% of 
the genome in these repeat classes, very sim- 
ilar to values reported previously (83). Repet- 
itive sequence may be underrepresented in 
the Celera assembly as a result of incomplete 
repeat resolution, as discussed above. About 
8% of the scaffold length is in gaps, and we 
expect that much of this is repetitive se- 
quence. Chromosome 19 has the highest re- 
peat density (57%), as well as the highest 
gene density (Table .10). Of interest, among 
the different classes of repeat elements, we 
observe a clear association of Alu elements 
and gene density, which was not observed 
between LINEs and gene density. 

5 Genome Evolution 

■ ».-...•. 
Summary. The dynamic nature of genome 
evolution can be captured at several levels. 
These include gene duplications mediated by 
RNA intermediates (retrotransposition) and 
segmental genomic duplications. In this sec- 
tion, we document the genome-wide occur- 
rence of retrotransposition events generating 
functional (intronless paralogs) or inactive 
genes (pseudogenes). Genes involved in 
translational processes and nuclear regulation 
account for nearly 50% of all introniess para- 
logs and processed pseudogenes detected in 
our survey. We have also cataloged the extent 
of segmental genomic duplication and pro- 
vide evidence for 1077 duplicated blocks 
covering 3522 distinct genes. 
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Fig. 11 (continued). Relation among gene density (orange), G+C content 
(green), EST density (blue), and Alu density (pink) along the lengths of 
each of the chromosomes. Gene density was calculated in 1-Mbp win- 



dows. The percent of C+C nucleotides was calculated in 100-kbp 
windows. The number of ESTs and Alu elements is shown per 100-kbp 
window. 



5.1 Retrotransposition in the human 
genome 

Retrotransposition of processed mRNA 
transcripts into the genome results in func- 
tional genes, called intronless paralogs, or 
inactivated genes (pseudogenes). A paralog 
refers to a gene that appears in more than 
one copy in a given organism as a result of 



a duplication event. The existence of both 
intron-containing and intronless forms of 
genes encoding functionally similar or 
identical proteins has been previously de- 
scribed (84, 85). Cataloging these evolu- 
tionary events on the genomic landscape is 
of value in understanding the functional 
consequences of such gene-duplication 



events in cellular biology. Identification of 
conserved intronless paralogs in the mouse 
or other mammalian genomes should pro- 
vide the basis for capturing the evolution- 
ary chronology of these transposition 
events and provide insights into gene loss 
and accretion in the mammalian radiation. 
A set of proteins corresponding to all 901 
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Otto-predicted, single-exon genes were sub- 
jected to BLAST analysis against the proteins 
encoded by the remaining multiexon predict- 
ed transcripts. Using homology criteria of 
70% sequence identity over 90% of the 
length, we identified 298 instances of single- 
to multi-exon correspondence. Of these 298 
sequences, 97 were represented in the Gen- 
Bank data set of experimentally validated 
full-length genes at the stringency specified 
and were verified by manual inspection. 
...We believe that these 97 cases may rep- 
resent intronless paralogs (see Web table 1 on 
Science Online at www.sciencemag.org/cgi/ 
content/full/291/5507/1304/DCl) of known 
genes. Most of these are flanked by direct 
repeat sequences, although the precise nature 
of these repeats remains to be determined. All 
of the cases for which we have high confi- 
dence contain polyadenylated [poly(A)] tails 
characteristic of retrotransposition. 

Recent publications describing the phe- 
nomenon of functional intronless paralogs 
speculate that retrotransposition may serve as 
a mechanism used to escape X-chromosomal 
inactivation (84, 86). We do not find a bias 
toward X chromosome origination of these 
retrotransposed genes; rather, the results 
show a random chromosome distribution of 
both the intron-containing and corresponding 
intronless paralogs. We also have found sev- 
eral cases of retrotransposition from a single 
source chromosome to multiple target chro- 
mosomes. Interesting examples include the 
retrotransposition of a five exon-containing 
ribosomal protein L21 gene on chromosome 
13 onto chromosomes 1, 3, 4, 7, 10, and 14, 
respectively. The size of the source genes can 
also show variability. The largest example is 
the 31-exon diacylglycerol kinase zeta gene 
on chromosome 11 that has an intronless 
paralog on chromosome 13. Regardless of 
route, retrotransposition with subsequent 
gene changes in coding or noncoding regions 
that lead to different functions or expression 
patterns, represents a key route to providing 
an enhanced functional repertoire in mam- 
mals (87). 

Our preliminary set of retrotransposed in- 
tronless paralogs contains a clear overrepre- 
seritatiori of genes involved, in translational 
processes (40% ribosomal proteins and 10% 
translation elongation factors) and nuclear 
regulation (HMG nonhistone proteins, 4%), 
as well as metabolic and regulatory enzymes. 
EST matches specific to a subset of intronless 
paralogs suggest expression of these intron- 
less paralogs. Differences in the upstream 
regulatory sequences between the source 
genes and their intronless paralogs could ac- 
count for differences in tissue-specific gene 
expression. Defining which, if any, of these 
processed genes are functionally expressed 
and translated will require further elucidation 
and experimental validation. 



The Human Genome 
FPseudogenes 

A pseudogene is a nonfunctional copy that is 
very similar to a normal gene but that has 
been altered slightly so that it is not ex-. 

Table 11. Genome overview. 



presSW: We developed a method for the pre- 
liminary analysis of processed pseudogenes 
in the human genome as a starting point in 
elucidating the ongoing evolutionary forces 



Size of the genome (including gaps) 

Size of the genome (excluding gaps) 

Longest contig 
. Longest scaffold 

Percent of A+T in the genome 

Percent of G+C in the genome 

Percent of undetermined bases in the genome 

Most GC-rich 50 kb 

Least GC-rich 50 kb 

Percent of genome classified as repeats 
Number of annotated genes 
Percent of annotated genes with unknown function 
Number of genes (hypothetical and annotated) 
Percent of. hypothetical and annotated genes with unknown function 
Gene with the most exons 
Average gene size 
Most gene-rich chromosome 
Least gene-rich chromosomes 

Total size of gene deserts (>500 kb with no annotated genes) 
Percent of base pairs spanned by genes 
Percent of base pairs spanned by exons 
Percent of base pairs spanned by introns 
Percent of base pairs in intergenic DNA 

Chromosome with highest proportion of DNA in annotated exons 
Chromosome with lowest proportion of DNA in annotated exons 
Longest intergenic region (between annotated + hypothetical genes) 
Rate of SNP variation 

*ln these ranges, the percentages correspond to the annotated gene set (26, 383 genes) and the hypothetical + 
annotated gene set, (39,1 14 genes), respectively. 

Table 12. Rate of recombination, per physical distance (cM/Mb) across the genome. Genethon markers 
were placed on CSA-mapped assemblies, and then relative physical distances and rates were calculated 
in 3-Mb windows for each chromosome. NA, not applicable. 



2.91 Gbp 
.2.66 Gbp 

1.99 Mbp 
. 14.4 Mbp 
• 54 

38 

9 

Chr. 2 (66%) 
Chr. X (25%) 
35 

26,383 
42 

39,114 
59 

Titin (234 exons) 
27 kbp 

Chr. 19 (23 genes/Mb) 
Chr. 13 (5 genes/Mb), 
Chr. Y (5 genes/Mb) 
605 Mbp 
25.5 to 37.8* 
1.1 to 1.4* 

24.4 to 36.4* 

74.5 to 63.6* 
Chr. 19 (9.33) 
Chr. Y (0.36) 

Chr. 13 (3,038,416 bp) 
1/1250 bp 



Chrom. 




Male 






Sex-average 






Female 




















Max 


Avg. 


Mia 


Max 


Avg. 


Min. 


Max 


Avg. 


Min. 


1 


2.60 


1.12 


0.23 


2.81 


1.42 


0.52 


3.39 


1.76 


0.68 


2 


2.23 


0.78 


0.33 


2.65 


1.12 


0.54 


3.17 


1.40 


0.61 


3 


2.55 


0.86 


0.23 


2.40 


1.07 


0.42 


2.71 


1.30 


0.33 


4 


1.66 


0.67 


0.15 


2.06 


1.04 


0.60 


2.50 


1.40 


0.77 


5 


2.00 


0.67 


0.18 


1.87 


1.08 


0.42 


. 2.26 


1.43 


0.62 


6 


1.97 


6.71 


0.28 


2.57 


1.12 


0.37 


3.47 


1.67 


0.64 


7 


2.34 


1.16 


0.48 


1.67 


1.17 


0.47 


2.27 


1.21 


0.34 


8 


1.83 


0.73 


0.14 


2.40 


1.05 


0.46 


3.44 


1.36 


0.43 


9 


2.01 


0.99 


0.53 


1.95 


1.32 


0.77 


2.63 


'1.66 


0.82 


10 


3.73 


1.03 


0.22 


3.05 


1.29 


0.66 


2.84 


1.51 


0.76 


11 


1.43 


0.72 


0.31 


2.13 


0.99 


0.47 


3.10 


132 


0.49 


12 


4.12 


0.76 


0.26 


3.35 


1.16 


0.49 


2.93 


1.55 


0.59 


13 


1.60 


0.75 


0.01 


1.87 


0.95 


0.17 


2.49 


1.19 


0.32 


14 


3.15 


0.98 


0.18 


2.65 


1.30 


0.62 


3.14 


1.63 


0.75 


15 


2.28 


0.94 


0.34 


2.31 


1.22 


0.42 


2.53 


1.56 


0.54 


16 


1.83 


1.00 


0.47 


2.70 


1.55 


0.63 


4.99 


232 


1.12 


17 


3.87 


0.87 


0.00 


3.54 


1.35 


0.54 


4.19 


1.83 


0.94 


18 


3.12 


1.37 


0.86 


3.75 


1.66 


0.43 


435 


2.24 


0.72 


19 


3.02 


0.97 


0.10 


2.57 


1.41 


0.49 


2.89 


1.75 


0.87 


20 


3.64 


0.89 


0.00 


2.79 


1.50 


0.83 


331 


2.15 


1.34 


21 


3.23 


1.26 


0.69 


2.37 


1.62 


1.08 


2.58 


1.90 


1.18 


22 


1.25 


1.10 


0.84 


1.88 


1.41 


1.08 


3.73 


2.08 


0.93 


X 


NA 


NA 


NA 


NA 


NA 


NA 


3.12 


1.64 


0.72 


Y 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


NA 


Genome 


4.12 


0.88 


0.00 


3.75 


1.22 


0.17 


4.99 


1.55 


0.32 
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that account for gene inactivation. The gen- 
eral structural characteristics of these pro- 
cessed pseudogenes include the complete 
. lack of intervening sequences found in the. 
functional counterparts, a poly(A) tract at the 
3' end, and direct repeats flanking the pseu- 
dogene sequence. Processed pseudogenes oc- 
cur as a result of retrotransposition, whereas 
unprocessed pseudogenes arise from segmen- 
tal genome duplication. 

We searched the complete set of Otto- 
predicted transcripts against the. genomic se- 
quence by means of BLAST. Genomic re-, 
gions corresponding . to all Otto-predicted 
transcripts were excluded from this analysis. 
. We identified 2909 regions matching with , ■■ 
greater than 70% identity over at least 70% of 
the length of the transcripts that likely repre- 
sent processed pseudogenes. This number is 
probably an underestimate because specific 
methods to search for pseudogenes were not 
used. 

We looked for correlations between 
structural elements and the propensity for 
retrotransposition in the human genome. 
GC content and transcript length were com- 
pared between the genes with processed 
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, pseudogenes , (1177 source genes) versus 
the remainder of the predicted gene set. 
Transcripts that give rise to processed pseu- 
dogenes have shorter average transcript 
length (1027 bp versus 1594 bp for the Otto 
set) as compared with genes for which no 
pseudogene was detected. The overall GC 
/content did not show, any significant differ- 
ence, contrary to a recent report (88). There 
is a .clear trend in gene families that are 
present as processed pseudogenes. These 
include ribosomal proteins (67%), lamin 
receptors (10%), translation elongation fac- 
tor alpha (5%), and HMG-non-histone pro- 
teins^ (2%). The increased occurrence of 
: retrotransposition (both intronless paralogs 
and processed pseudogenes) among genes 
involved in translation and nuclear regula- 
tion may reflect an increased transcription- . 
al activity of these genes. 

5.3 Gene duplication in the human 
genome 

Building on a previously published procedure 
(27), we developed a graph-theoretic algo- 
rithm, called Lek, for grouping the predicted 
human protein set into protein families (89). 



Table 13. Characteristics of CpC islands identified in chromosome 22 (34-Mbp sequence length) and the 
whole genome (2.9-Cbp sequence length) by means of two different methods. Method 1 uses a CG 
likelihood ratio of SQ.6. Method 2 uses a CG likelihood ratio of ^0.8 



Chromosome 22 



Whole genome 
(CS assembly) 



Number of CpG islands 

detected 
Average length of island (bp) 
Percent of sequence 

predicted as CpG 
Percent of first exons that 

overlap a CpG island 
Percent of first exons with 

first position of exon 

contained inside a CpG 

island 



Method 1 


Method 2 


Method 1 


Method 2 


5,211 


522 


195.706 


26,876 


390 


535 


395 


497 


5.9 


0.8 


2.6 


0.4 


44 


25 


42 


22 


37 


22 


40 


21 



Average distance between 

first exon and closest CpG 

island (bp) 
Expected distance between 

first exon and closest CpG 

island (bp) 


1.013 
3 ( 262 


10,486 
32,567 


2,182 
7,164 


17,021 
55,811 


Table 14. Distribution of repetitive DNA in the compartmentalized shotgun assembly 


sequence. ^ 


Repetitive elements 




Megabases in 
assembled 
sequences 


Percent 
of 

assembly 


Previously 
predicted 
(%) (83) 


Alu 

Mammalian interspersed repeat (MIR) 

Medium reiteration (MER) 

Long terminal repeat (LTR) 

Long interspersed nucleotide element 

(LINE) 
Total 




288 
66 
50 
155 
466 

1025 


9.9 
2.3 
1.7 
5.3 
16.1 

35.3 


10.0 
1.7 
1.6 
5.6 

16.7 

35.6 



The complete clusters that result from the 
Lek clustering provide one basis for compar- 
ing the role of whole-genome or chromosom- 
al duplication in protein family expansion as 
opposed to other means, such as tandem du- 
plication. Because each complete cluster rep- 
resents a closed and certain island of homol- 
: ogy, and because Lek is capable- of simulta-. 

neously .clustering protein . complements of 
- several organisms, the number, of proteins 
. contributed by each organism to a complete 
cluster can be predicted with confidence de- 
pending on the quality of the annotation of 
each genome. The variance of each organ- 
ism's contribution to each cluster can then be 
. calculated, allowing an assessment of the rel- 
ative importance , of large-scale duplication 
versus smaller-scale, organism-specific ex- 
pansion and contraction of protein families, 
presumably as a result of natural selection 
operating on individual protein families with- 
in an organism. As can be seen in Fig. 12, the 
large variance in the relative numbers of hu- 
man as compared with D. melanogaster and 
Caenorhabditis elegans proteins in complete 
clusters may be explained by multiple events 
of relative expansions in gene families in 
each of the three animal genomes. Such ex- 
pansions would give rise to the distribution 
that shows a peak at 1:1 in the ratio for 
■ human-worm or human-fly clusters with the 
slope spread covering both human and fly/ 
worm, predominance, as we observed (Fig. 
12). Furthermore, there are nearly as many 
clusters where worm and fly proteins pre- 
dominate despite the larger numbers of pro- 
teins in the human. At face value, this anal- 
ysis suggests that natural selection acting on 
individual protein families has been a major 
force driving the expansion of at least some 
elements of the human protein set However, 
in our analysis, the difference between an 
ancient whole-genome duplication followed 
by loss, versus piecemeal duplication, cannot 
be easily distinguished. In order to differen- 
tiate these scenarios, more extended analyses 
were performed. 

5.4 Large-scale duplications 

Using two independent methods, we 
searched for large-scale duplications in the 
human genome. First, we describe a protein 
family-based method that identified highly 
conserved blocks of duplication. We then 
: describe our comprehensive method for identi- 
fying all interchromosomal block duplications. 
The latter method identified a large number of 
duplicated chromosomal segments covering 
parts of all 24 chromosomes. 

The first of the methods is based on the 
idea of searching for blocks of highly con- 
served homologous proteins that occur in 
more than one location on the genome. For 
this comparison, two genes were considered 
equivalent if their protein products were de- 



1328 



16 FEBRUARY 2001 VOL 291 SCIENCE www^ciencemag.org 



termined to be in the same family and the 
same complete Lek cluster (essentially 
paralogous genes) (89). Initially, each chro- 
mosome was represented as a string of genes 
ordered by the start codons for predicted 
genes along the chromosome. We considered 
the two strands as a single string, because 
local inversions are relatively common events 
relative to large-scale duplications. Each 
gene was indexed according to the protein 
family and Lek complete cluster (89). All 
pairs of. indexed gene strings . were then 
aligned in both the forward arid reverse di- 
rections with the Smith-Waterman algorithm 
(90). A match between two proteins of the 
same Lek complete cluster was given a score 
of 10 and a mismatch -10, with gap open 
and extend penalties of -4 and -1. With 
these parameters, 19 conserved interchromo- 
somal blocks of duplication were observed, 
all of which were also detected and expanded 
by the comprehensive method described be- 
low. The detection of only a relatively small 
number of block duplications was a conse- 
quence of using an intrinsically conservative 
method grounded in the conservative con- 
straints of the complete Lek clusters. 

In the second, more comprehensive ap- 
proach, we aligned all chromosomes directly 
with one another using an algorithm based on 
the MUMmer system (91). This alignment 
method uses a suffix tree data structure and a 
linear-time algorithm to align long sequences 
very rapidly; for example, two chromosomes 
of 100 Mbp can be aligned in less than 20 
min (on a Compaq Alpha computer) with 4 
gigabytes of memory. This procedure was 
used recently to identify numerous large- 
scale segmental duplications among the five 
chromosomes of A. thaliana (92); in that 
organism, the method revealed that 60% of 
the genome (66 Mbp) is covered by 24 very 
large duplicated segments. For Arabidopsis, a 
DNA-based alignment was sufficient to re- 
veal the segmental duplications between 
chromosomes; in the human genome, DNA 
alignments at the whole-chromosome level 
are insufficiently sensitive. Therefore, a mod- 
ified procedure was developed and applied, 
as follows. First, all 26,588 proteins 
(9,675,713 million amino acids) were concat- 
enated end-to-end in order as they occur 
along each of the 24 chromosomes, irrespec- 
tive of strand location. The concatenated pro- 
tein set was then aligned against each chro- 
mosome by ( the MUMmer algorithm. The 
resulting matches were clustered to extract all 
sets of three or more protein matches that 
occur in close proximity on two different 
chromosomes (93); these represent the can- 
didate segmental duplications. A series of 
filters were developed and applied to remove 
likely false-positives from this set; for exam- 
ple, small blocks that were spread across 
many proteins were removed. To refine the 
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tring methods, a shuffled protein set was 
created by taking the 26,588 proteins, 
randomizing their order, and then partitioning 
them into 24 shuffled chromosomes, each 
containing the same number of proteins as the 
true genome. This shuffled protein set has the 
. identical composition to the real genome; in 
particular, every protein and every domain 
appears the same number of times. The com- 
plete algorithm was then applied to both the, 
; real and the shuffled data, with the results on 
the shuffled data, being used to estimate the ; 
false-positive rate. The algorithm after filter- 
ing yielded 10,310 gene pairs in 1077 dupli- 
cated blocks containing 3522 distinct genes; 
. tandemly duplicated expansions in many of 
the blocks explain the excess of gene pairs to 
distinct genes. In the shuffled data, by con- 
trast, only 370 gene pairs were found, giving 
a false-positive estimate of 3.6%. The most 
likely explanation for the 1077 block dupli- 
cations is ancient segmental duplications. In 
many cases, the order of the proteins has been 
shuffled, although proximity is preserved. 
Out of the .1077 blocks, 159 contain only 
three genes, 137 contain four genes, and 781 • 
contain five or more genes. 

To illustrate the extent of the detected 
duplications, Fig. 13 shows all 1077 block 
duplications indexed to each chromosome in 
24 panels in which only duplications mapped 
to the indexed chromosome are displayed. 
The figure makes it clear that the duplications 
are ubiquitous in the genome. One feature 
that it displays is many relatively small chro- 
mosomal stretches, with one-to-many dupli- « 
cation relationships that are graphically strik- 
ing. One such example captured by the anal- 
ysis is the well-documented olfactory recep- 
tor (OR) family, which is scattered in blocks 
throughout the genome and which has been 
analyzed for genome-deployment reconstruc- 



tion^^everal evolutionary stages (94). The 
figure also illustrates that some chromo- 
somes, such as chromosome 2, contain many 
more detected large-scale duplications than 
\ others. Indeed, one of the largest duplicated 
segments is a large block of 33 proteins on 
chromosome 2, spread among eight smaller 
blocks in 2p, that aligns to a paralogous set on 
chromosome 14, with one rearrangement (see 
chromosomes 2 and 14 panels in Fig. 13). 
. The proteins are not contiguous but span a 
: region - containing 97 proteins on chromo- 
some 2 and 332 proteins on chromosome 14. 
-The likelihood of observing this many dupli- 
cated proteins by chance, even over a span of 
this length, is 2.3 X 10~ 68 (93). This dupli- 
cated set spans 20 Mbp on chromosome 2 and 
63 Mbp on chromosome 14, over 70% of the 
latter chromosome. Chromosome 2 also con- 
tains a block duplication that is nearly as 
. large, which is shared by chromosome arm 2q 
and chromosome 12. This duplication incor- 
porates two of the four known Hox gene 
clusters, but considerably expands the extent 
of the duplications proximally and distally on 
the pair of chromosome arms. This breadth of 
duplication is also seen on the two chromo- 
somes carrying the other two Hox clusters. 

An additional large duplication, between 
chromosomes 18 and 20, serves as a good 
example to illustrate some of the features 
common to many of the other observed large 
duplications (Fig. 13, inset); This duplication 
contains 64 detected ordered intrachromo- 
somal pairs of homologous genes. After dis- 
counting a 40-Mb stretch of chromosome 18 
free of matches to chromosome 20, which is 
likely to represent a large insert (between the 
gene assignments "Krup rel" and "collagen 
rel" on chromosome 18 in Fig. 13), the full 
duplication segment covers 36 Mb on chro- 
mosome 18 and 28 Mb on chromosome 20. 
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Human/Worm 
Human/Fly 



5:1 4:1 3:1 
human predominant 
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Ratio 



1:3 1:4 1:5 
fly/worm predominant 



Fig. 12. Gene duplication in complete protein dusters. The predicted protein sets of human, worm, 
and fly were subjected to Lek clustering (27). The numbers of clusters with varying ratios (whole 
number) of human versus worm and human versus fly proteins per cluster were plotted. 
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By this measure, the duplication segment 
spans nearly half of each chromosome's net 
length. The most likely scenario is that the 
whole span of this region was duplicated as a 
single very large block, followed by shuffling 
owing to smaller scale rearrangements. As 
such, at least four subsequent rearrangements 
would need to be invoked to explain the 
relative insertions and inversions seen in the 
. duplicated segment interval The 64 protein . 
pairs in this alignment occur among 217 pro- 
tein assignments on chromosome 18, and 
among 322 protein assignments on chromo- 
some 20, for a density of involved proteins of 
.20 to 30%. -This is consistent with an ancient - 
large-scale duplication followed by subse- 
quent gene loss on one or both chromosomes. 
Loss of just one member of a gene pair 
subsequent to the duplication would result in 
a failure to score a gene pair in the block; less 
than 50% gene loss on the chromosomes , 
would lead to the duplication density ob- 
served here. As' an independent verification 
of the significance of the alignments detect- 
ed, it can be seen that a substantial number of , 
the pairs of aligning proteins in this duplica- . 
tion, including some of those annotated (Fig. 
13), are those populating small Lek complete 
clusters (see above). This indicates that they 
are members of very small families of para- 
logs; their relative scarcity within the genome 
validates the uniqueness and robust nature of 
their alignments. ' 

Two additional qualitative features were ob- . 
served among many of the large-scale duplica- 
tions. First, several proteins with disease asso- 
ciations, with OMM (Online Mendelian Inher- 
itance in Man) assignments, are members of 
duplicated segments (see web table 2 on Sci- 
ence Online at www.sciencemag.org/cgi/con- 
tent/full/291/5507/13(M/DCl). We have also 
observed a few instances where paraJogs on 
both duplicated segments are associated with 
similar disease conditions. Notable among 
these genes are proteins involved in hemostasis 
(coagulation factors) that are associated with 
bleeding disorders, transcriptional regulators 
like the homeobox proteins associated with de- 
velopmental disorders, and potassium channels 
associated with cardiovascular conduction ab- 
normalities. For each of these disease genes, 
closer study of the paralogous genes in the 
duplicated segment may reveal new insights 
into disease causation, with further investiga- 
tion needed to determine whether they might be 
involved in the same or similar genetic diseases. 
Second, although there is a conserved number 
of proteins and coding exons predicted for spe- 
cific large duplicated spans within the chromo- 
some 18 to 20 alignment, the genomic DNA of 
chromosome 18 in these specific spans is in 
some cases more than 10-fold longer than the 
corresponding chromosome 20 DNA. This se- 
lective accretion of noncoding DNA (or con- 
versely, loss of noncoding DNA) on one of a 



pair of duplicated chromosome regions was 
observed in many compared regions. Hypothe- 
ses, to explain which mechanisms foster these 
processes must be tested 
. Evaluation . of the alignment results gives 
. some perspective on dating of the duplications. 
. As noted above, large-scale ancient segmental 
, .u duplication in fact best explains many of the 
; .blocks detected by this genome-wide analysis. 
The. regions of human chromosomes involved 
in the large-scale duplications expanded upon 
above (chromosomes 2 to 14, 2 to 12, and 18 to 
20) are each syntenic to a distinct mouse chro- 
. mosomal region. The corresponding mouse 
, chromosomal regions are much more similar in . 
, sequence conservation, and even in order, to 
their human synteny partners than the human 
, duplication regions are to each other. Further, 
the corresponding mouse chromosomal regions 
each bear a significant proportion of genes or- 
; thologous to the human genes on which the 
. human duplication assignments were made. On 
. the basis, of these factors, the corresponding 
mouse .chromosomal spans, at coarse resolu- 
tion, appear to be products of the same large- 
, scale duplications observed in humans. Al- 
though further detailed analysis must be carried 
out once a more complete genome is assembled 
for mouse, the underlying large duplications 
. appear to predate the two species'^ divergence. 
.This dates the duplications, at the latest, before 
^divergence of the primate and rodent lineages. 
: ..This date can be further refined upon examina- 
tion of the synteny between human chromo- 
somes and those of chicken, pufferfish (Fugu 
rubripes), or zebrafish (95). The only sub- 
stantial syntenic stretches mapped in these 
species corresponding to both pairs of human 
duplications are restricted to the Hox cluster 
. regions. . When the synteny of these regions 
(or others) to human chromosomes is extend- 
ed with further mapping, the ages of the 
nearly chromosome-length duplications seen 
in humans are likely to be dated to the root of 
vertebrate divergence. 

The MUMmer-based results demonstrate 
large block duplications that range in size from 
a few genes to segments covering most of a 
chromosome. The extent of segmental duplica- 
tions raises the question of whether an ancient 
whole-genome duplication event is the under- 
lying explanation for the numerous duplicated 
regions (96). The duplications have undergone 
many deletions and subsequent rearrangements; 
these events make it difficult to distinguish 
between a whole-genome duplication and mul- 
tiple smaller events. Further analysis, focused 
especially on comparing the estimated ages of 
all the block duplications, derived partially 
from interspecies genome comparisons, will be 
necessary to determine which of these two hy- 
potheses is more likely. Comparisons of ge- 
nomes of different vertebrates, and even cross- 
phyla genome comparisons, will allow for the 
deconvolution of duplications to eventually re- 



veal the stagewise history of our genome 
with it a history of the emergence of many* 
the key functions that distinguish us from oth» 
living things. 

6 A Genome-Wide Examination of 
Sequence Variations 

Summary. Computational methods were used 
to identify single-nucleotide polymorphism! 
(SNPs) by comparison of the Celera sequence 
to other SNP resources. The SNP rate be- 
tween two chromosomes was ~~ 1 per 1200 to 
1500 bp. SNPs are distributed nonrandomly 
throughout the genome. Only a very' small 
- proportion of air SNPs (<1%) potentially 
impact protein function based on the func- 
tional analysis of SNPs that affect the pre- 
dicted coding regions. This results in an cs- 
timate that only thousands, not millions, of 
: genetic variations may contribute to the struc- 
tural diversity of human proteins. 

Having a complete genome sequence enables 
- researchers to achieve a dramatic acceleration 
in the rate of gene discovery, but only through 
analysis of sequence variation in DNA can wc 
discover the genetic basis for variation in health 
. among human beings. Whole-genome shotgun 
sequencing is a particularly effective method 
for detecting sequence variation in tandem with 
whole-genome assembly. In addition, we com- 
pared the distribution and attributes of SNPs 
ascertained by three other methods: (i) align- 
ment of the Celera consensus sequence lo the 
PFP assembly, (ii) overlap of high-quality reads 
of genomic sequence (referred to as "Kwok"; 
1,120,195 SNPs) (97), and (iii) reduced repre- 
sentation shotgun sequencing (referred to as 
"TSC"; 632,640 SNPs) (98). These data were 
consistent in showing an overall nucleotide di- 
versity of -8 X 10" 4 , marked heterogeneity 
across the genome in SNP density, and an 
overwhelming preponderance of noncoding 
variation that produces no change in expressed 
proteins. 

6.1 SNPs found by aligning the Celera 
consensus to the PFP assembly 

Ideally, methods of SNP discovery make full 
use of sequence depth and quality at every site, 
and quantitatively control the rate of false-pos- 
itive and false-negative calls with an explicit 
sampling model (99). Comparison of consensus 
sequences in the absence of these details ncccs- ■ .;■ 
sitated a more ad hoc approach (quality scores 
could not readily be obtained for the PFP as- 
sembly). First, all sequence differences between 
the two consensus sequences were identified; 
these were then filtered to reduce the contribu- 
tion of sequencing errors and misassembly. As 
a measure of the effectiveness of the filtenng 
step, we monitored the ratio of transition and 
transversion substitutions, because a 2:1 ratio 
has been well documented as typical in mam- 
malian evolution (100) and in human SNPs 
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(101, 102). The filtering steps consisted of re- 
moving variants where the quality score in the 
Celera consensus was less than 30 and where 
the density of variants was greater than 5 in 400 
bp. These filters resulted in shifting the transi- 
uon-to-transversion ratio from 1.57:1 to 
1.89: 1. When applied to 2.3 Gbp. of alignments 
between the Celera and PFP consensus se- 
quences, these filters resulted in identification 
of 2,104,820 putative SNPs from a total of 
2,778,474 substitution differences. Overlaps 
between this set of SNPs and those found by 
other methods are described below. 

6.2 Comparisons to public SNP 
databases 

Additional SNPs, including 2,536,021 from 
dbSNP (www.ncbi.nlm.nih.gov/SNP) and 
13,150 from HGMD (Human Gene Muta- 
tion Database, from the University of 
Wales, UK), were mapped on the Celera con- 
sensus sequence by a sequence similarity 
search with the program PowerBIast (103). The 
two largest data sets in dbSNP are the Kwok 
and TSC sets, with 47% and 25% of the dbSNP 
records. Low-quality alignments with partial 
coverage of the dbSNP sequence and align- 
ments that had less than 98% sequence identity 
between the Celera sequence and the dbSNP 
flanking sequence were eliminated dbSNP se- 
quences mapping to multiple locations on the 
Celera genome were discarded. A total of 
2,336,935 dbSNP variants were mapped to 
1,223,038 unique locations on. the Celera se- 
quence, implying considerable redundancy in 
dbSNP. SNPs in the TSC set . mapped to 
585,8 1 1 unique genomic locations, and SNPs in 
the Kwok set mapped to 438,032 unique loca- 
tions. The combined unique SNPs counts used 
in this analysis, including Celera-PFP TSC 
and Kwok, is 2,737,668. Table 15 shows that a 
substantial fraction of SNPs identified by one of 
these methods was also found by another meth- 
od The very high overlap (36.2%) between the 
Kwok and Celera-PFP SNPs may be due in part 
to the use by Kwok of sequences that went into 
the PFP assembly. The unusually low overlap 
(16.4%) between the Kwok and TSC sets is due 



If,n !, of SNPs from genome-wide 

SNP databases. Table entries are SNP counts for 
each pair of data sets. Numbers in parentheses are 
the fraction of overlap, calculated as the count of 
overlapping SNPs divided by the number of SNPs 
in the smaller of the two databases compared 
Total SNP counts for the databases are: Celera- 
PFP, 2,104,820; TSC 585,811; and Kwok 438,032 
Only unique SNPs in the TSC and Kwok data sets 
were included. 
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^■heir being the smallest two sets. In addition 
24.5% of the Celera-PFP SNPs overlap with 
SNPs derived from the Celera genome se- 
quences (46). , SNP validation in population 
samples is an expensive and laborious process, 
so confirmation on multiple data sets may pro-' 
vide an efficient initial validation "in silico" (by 
computational analysis). 

One means of* assessing whether the 
three sets of SNPs provide the same picture 
of human variation is to tally the frequen- 
cies of. the six possible base changes in 
each.set of SNPs.(Table 16). Previous mea- 
sures of nucleotide diversity were mostly 
derived from small-scale analysis on can- 
didate genes (101), and our analysis with 
all three data sets validates the previous 
observations at the whole-genome scale. 
There is remarkable homogeneity between 
the SNPs found in the Kwok set, the TSC 
set, and in our whole-genome shotgun (46) 
in. this substitution pattern. Compared with 
the rest of the data sets, Celera-PFP devi- 
ates slightly from the 2:1 transition-to- 
transversion ratio observed in the other 
SNP sets. This result is not unexpected, . 
because some fraction of the computation- 
ally identified SNPs in the Celera-PFP 
comparison may in fact be sequence errors. 
A 2 : 1 transition:transversion ratio for the 
bona fide SNPs would be obtained if one 
assumed that 15% of the sequence differ- 
ences in the Celera-PFP set were a result of 
(presumably random) sequence errors. 

6.3 Estimation of nucleotide diversity 
from ascertained SNPs 

The number of SNPs identified varied 
widely across chromosomes. In order to 
normalize these values to the chromosome 
size and sequence coverage, we used it, the 
standard statistic for nucleotide diversity 
(104). Nucleotide diversity is a measure, of 
per-site heterozygosity, quantifying the 
probability that a pair of chromosomes 
drawn from the population will differ at a 
nucleotide site. In order to calculate nucle- 
otide diversity for each chromosome, we 
need to know the number of nucleotide 
sites that were surveyed for variation, and 
in methods like reduced respresentation se- 
quencing, we need to know the sequence 
quality and the depth of coverage at each 



siteWese data are not readily available, so 
we could not estimate nucleotide diversity 
from the TSC effort. Estimation of nucleo- 
tide diversity from. high-quality sequence 
overlaps should be possible, but again 
more information, is needed on the details' 
of all the alignments. 

Estimation of nucleotide diversity from a 
shotgun assembly entails calculating for each 
; column of the multialignment, the probability 

mat two or more distinct alleles, are present 
■ and the probability of detecting a SNP if in 
. ' fact the alleles have, different sequence (i.e., 
the probability of correct sequence calls). The 
greater the depth of coverage and the higher 
the sequence quality, the higher is the chance 
of successfully detecting a SNP (105). Even 
after correcting for variation in coverage, the 
nucleotide diversity, appeared to vary across 
autosomes. The significance of this heteroge- 
neity was tested by analysis of variance, with 
. estimates of ir for 100-kbp windows to esti- 
mate variability within chromosomes (for the 
Celera-PFP comparison, F = 29 73 P < 
0.0001). * ' 

Average diversity for the autosomes es- 
timated from the Celera-PFP comparison 
was 8.94 X 10" 4 . Nucleotide diversity on 
the X chromosome was 6.54 X 10~ 4 . The 
X is expected to be less variable than au-' 
tosomes, because for every four copies of 
autosomes in the population, there are only 
three X chromosomes, and this smaller ef- 
fective population size means that random 
drift will more rapidly remove variation 
from the X (106). 

Having , ascertained nucleotide variation 
genome-wide, it appears that previous esti- 
mates of nucleotide diversity in humans 
based on samples of genes were reasonably 
accurate (101, 102, 106, 107). Genome-wide, 
our estimate of nucleotide diversity was' 
8.98 X 10- 4 for the Celera-PFP alignment, 
and a published estimate averaged over 10 
densely resequenced human genes was 

8.00 x w~ 4 (ios). 



6.4 Variation In nucleotide diversity 
across the human genome 

Such an apparently high degree of variabil- 
ity among chromosomes in SNP. density 
raises the question of whether there is het- 
erogeneity at a finer scale within chromo- 



Table 16. Summary of nucleotide changes in different SNP data sets. 
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2.07:1 
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Rg. 13. Segmental duplica- 
tions between chromo- 
somes in the human ge- 
nome. The 24 panels show 
the 1077 duplicated blocks 
of genes, containing 10310 
pairs of genes in totaL Each 
line represents a pair of ho- 
mologous genes belonging 
to a block; all blocks con- 
tain at least three genes 
on each of the chromo- 
. somes where they appear. 
Each panel shows all the 
- duplications between a 
single chromosome and 
other chromosomes with 
shared blocks. The chro- 
mosome at the center of 
each panel is shown as a 
thick red line for emphasis. 
Other chromosomes are 
displayed from top to bot- 
. torn within each panel or- 
dered by chromosome 
number. The inset (bot- 
tom, center right) shows a 
dose-up of one duplica- 
tion between chromo- 
somes 18 and 20, expand- 
ed to display the gene 
names of 12 of the 64 
gene pairs shown. 
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somes, , and whether this heterogeneity is 
. greater than expected by chance. If SNPs 
occur by random and independent mutations, 
then it would seem that there ought to be a 
Poisson distribution of numbers of SNPs in 
fragments of arbitrary constant size. The ob- 
served dispersion in the distribution of SNPs 
in 100-kbp fragments was far greater than 
predicted from a Poisson distribution (Fig. 
14). However, this simplistic model ignores 
the different recombination rates and popula- 
tion histories that exist in different regions of 
. the genome. Population genetics theory holds 
that we can account for this variation with a 
mathematical formulation called the neutral . 
coalescent (7 09); Applying well-tested algo- 
rithms for simulating the neutral coalescent 
with recombination (210), and using an ef- 
fective population size of 10,000 and a per- 
base recombination rate equal to the mutation 
rate (111), we generated a distribution of num- 
bers of SNPs by this model as well (112). The 
observed distribution of SNPs has a much larg- - 
er variance than either the Poisson model or the 
coalescent model, and the difference is highly 
significant This implies that there is significant 
variability across the genome in SNP density, 
an observation that begs an explanation. 

Several attributes of the DNA sequence 
may affect the local density of SNPs, in- 
cluding the rate at which DNA polymerase 
makes errors and the efficacy of mismatch 
repair. One key factor that is likely to be 
associated with SNP density is the G+C 
content, in part because methylated cy- 
tosines in CpG dinucleotides tend to under- 
go deamination to form thymine, account- 
ing for a nearly 10-fold increase in the 
mutation rate of CpGs over other dinucle- 



. otides. We tallied the GC content and nu- 
cleotide diversities in 100-kbp windows 

■ across the entire genome and found that the 
correlation between them was positive (r = 
0.21) and highly significant (P < 0.0001), 

, but G+C content accounted . for only a 

* small part of the variation. 

6.5 SNPs by genomic class 

,Toj test homogeneity of SNP -densities 
across functional classes, we partitioned 
■sites into intergenic (defined as >5 kbp 
from any predicted transcription unit), 5'- 
UTR, exonic .(missense and silent), in- 
. tronic, and 3'.-UTR for . 10,239 known 
genes, .derived from the NCBI.RefSeq da- 
tabase and all human genes predicted from 
..the Celera Otto annotation. In coding re- 
. gions, SNPs were categorized as either si- 
lent, for those that do not change amino 
acid sequence, or missense, for those that 
change the protein product. The ratio of 
missense to silent coding SNPs in Celera- 
PFP, TSC, and Kwok sets (1.12, 0.91, and 
0.78, respectively) shows a markedly re- 
- duced frequency of missense variants com- 
. pared with the .neutral expectation, consis- 
tent with the elimination by natural selec- 
tion of a fraction of the deleterious amino 
acid changes (1 1 2). These ratios are com- - 
parable. to the missense-to-silent ratios of. 
0.88 and LI 7 found by Cargill et ah (101) 
and by Halushka et ah (102). Similar re- 
sults were observed in SNPs derived from 
Celera shotgun sequences (46). 

It is striking how small is the fraction of 
SNPs that lead to potentially dysfunctional 
alterations in proteins. In the 10,239 Ref- 
Seq genes, missense SNPs were only about 




Number of SNPs / 100 kb 

Fig. 14. SNP density in each 100-kbp interval as determined with Celera-PFP SNPs. The color codes 
are as follows: black, Celera-PFP SNP density; blue, coalescent model; and red, Poisson distribution. 
The figure shows that the distribution of SNPs along the genome is non random and is not entirely 
accounted for by a coalescent model of regional history. 



0.12, 0.14, and 0.17% of the total SNP 
counts in Celera-PFP, TSC, and Kwok 
SNPs, respectively. Nonconservative pro- 

• tein changes constitute an even smaller frac- 
tion of missense SNPs (47, 41, and 40% in 
Celera-PFP, Kwok, and TSC). Intergenic re- 
gions have been virtually unstudied (113), and 
we note that 75% of the SNPs we identified 
were intergenic (Table 17). The SNP rate was 

. highest in introns and lowest in exons. The SNP 
rate was lower in intergenic regions than in 
introns, providing one of the first discriminators 
between these two classes of DNA. These SNP 

- rates were confirmed in the Celera SNPs, which 

. also exhibited a lower rate in exons than in 

. introns, and in extragenic regions than in in- 
trons (46). Many of these intergenic SNPs will 

-provide valuable information in the form of 
markers for linkage and association studies, and 

. some fraction is likely to have a regulatory 
function as well. 

.7 An Overview of the Predicted 
. .Protein-Coding Genes in the Human 
. Genome 

Summary. This section provides an initial 
computational analysis of the predicted 
protein set with the aim of cataloging 
prominent differences and similarities 
when the human genome is compared with 
. other fully, sequenced eukaryotic genomes. 
Over 40% of the predicted protein set in 
humans cannot, be ascribed a molecular 
function by methods that assign proteins to 
known families. A protein domain- based 
analysis provides a detailed catalog of the 
prominent differences in the human ge- 
nome when compared with the fly and 
.worm genomes. Prominent among these are 
. domain expansions in proteins involved in 
developmental regulation and in cellular 
processes such as neuronal function, hemo- 
stasis, acquired immune response, and cy- 
toskeletal complexity. The final enumera- 
tion of protein families and details of pro- 
tein structure will rely on additional exper- 
imental work and comprehensive manual 
curation. 

A preliminary analysis of the predicted hu- 
man protein-coding genes was conducted. 
Two methods were used to analyze and clas- 
sify the molecular functions of 26,588 pre- 
dicted proteins that represent 26,383 gene 
predictions with at least two lines of evidence 
as described above. The first method was 
based on an analysis at the level of protein 
families, with both the publicly available 
Pfam database (114, 115) and Celera's Pan- 
ther Classification (CPC) (Fig. 15) (116). 
The second method was based on an analysis 
at the level of protein domains, with both the 
Pfam and SMART databases (775, 777). 

The results presented here are prelimi- 
nary and are subject to several limitations. 
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Both the gene predictions and functional 
assignments have been made by using com- 
putational tools, although the statistical 
models in Panther, Pfam, and SMART have 
been built, annotated, and reviewed by ex- 
pert biologists. In the set of computationally 
predicted genes, we expect both false-positive 
predictions (some of these may in fact be inac- 
tive pseudogenes) and false-negative predic- 
tions (some human genes will not be computa- 
tionally predicted). We also expect errors in 
delimiting the boundaries of exons and genes. 
Similarly, in the automatic functional assign- 
ments, we also expect both false-positive and 
false-negative predictions. The functional as- 
signment protocol focuses on protein families 
that tend to be found across several organisms, 
or on families of known human genes. There- 
fore, we do not assign a function to many genes 
that are not in large families, even if the func- 
tion is known. Unless otherwise specified, all 
enumeration of the genes in any given family or 
functional category was taken from the set of. 
26,588 predicted proteins, which were assigned 
functions by using statistical score cutoffs de- 
fined for models in Panther, Pfam, and 
SMART. 

For this initial examination of the pre- 
dicted human protein set, three broad ques- 
tions were asked: (i) What are the likely 
molecular functions of the predicted gene 
products, and how are these proteins cate- 
gorized with current classification meth- 
ods? (ii) What are the core functions that 
appear to be common across the animals? 
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(iii) How does the human protein comple- 
ment differ from that of other sequenced 
eukaryotes? 

7.1 Molecular functions of predicted 
human proteins 

Figure 15 shows an overview of the puta- 
tive molecular functions of the predicted 
26,588 human proteins that have at least 
two lines of; supporting evidence.. About 
. 41% (12,809) of .the. gene products could 
not be classified from this initial analysis 
and are termed proteins with unknown 
functions. Because our automatic classifi- 
cation methods treat only relatively large 
. protein families, there are a number of 
"unclassified" sequences that do, in fact, , 
have a known or predicted function. For the i 
60% of the protein set that have automatic 
functional predictions, the specific protein 
functions have been placed into broad 
classes. We focus here on molecular func- 
tion (rather than higher order cellular pro- 
cesses) in order to classify as many proteins 
as possible. .These functional predictions 
are based on .similarity to sequences of 
known function. 

In our analysis of the 12,731 additional low- 
confidence predicted genes (those with only 
one piece of supporting evidence), only 636 
(5%) of these additional putative genes were 
assigned molecular functions by the automated 
methods. One-third . of these 636 predicted 1 . 
genes represented endogenous retroviral pro- 
teins, further suggesting that the majority of 



these unknown-function genes are not real 
genes. Given that most of these additional 
12,095 genes appear to be unique among the 
genomes sequenced to date, many may simply 
.represent false-positive gene predictions. 
The most common molecular functions are 
. the transcription factors and those involved in 
nucleic acid metabolism (nucleic acid enzyme). 
Other functions that are highly represented in 
the. human genome are the receptors, kinases, 
v and hydrolases. Not surprisingly, most of the 
, hydrolases are proteases. There are also many 
proteins that are members of proto-oncogene 
families, as well as families of "select regula- 
tory molecules": (i) proteins involved in specif- 
ic steps of signal transduction such as hetero- 
trimeric GTP-binding proteins (G proteins) and 
cell cycle regulators, and (ii) proteins that mod- 
ulate the activity of kinases, G proteins, and 
phosphatases. 

Table. 17. Distribution of SNPs in classes of 
genomic regions. 



• Genomic region 
class 


. Size of 
region 
. examined 
(Mb) 


Celera-PFP 
SNP 
density 
(SNP/Mb) 


Intergenic 


2185 


707 


Gene (intron + 


646 


917 


exon) 






Intron 


615 


921 


First intron 


164 


808 


Exon 


31 


529 


First exon 


10 


592 



cell adhesion (577, 1.9%) 
miscellaneous (1318, 4.3%) 
viral protein (100,0.3%), 
transfer/carrier protein (203, 0.7%) 
transcription factor ( 1 850, 6.0%) ^ \ 



\ 




nucleic acid emymc (2308, 7.5%) 

signaling molecule (376, 1.2%) , 
receptor ( 1 543, 5.0%) 

kinase (868, 2.8%) 
select regulatory molecule (988, 32%) 



transferase (6 10, 2.0%) ^ 
synthase and synthetase (31 3, 1 .0%) ' 

oxidoretfuctase (656, 2.1%) ^ / 
ryasc(H7,0.4%K / 



chapcrone(l59.0.5%) 

cytoskclctal structural protein (876, 2.8%) 
extracellular matrix (437, 1.4%) 
immunoglobulin (264, 0.9%) 
ion channel (406, 13%) 
motor (376.1.2%) 

structural protein of muscle (296, 1 .0%) 
protooncogene (902, 2.9%) 

select calcium binding protein (34, 0.1%) 
intracellular transporter (350, 1.1%) 
transporter (533, 1 .7%) 

i 





Fig. 15. Distribution 
of the molecular 
functions of 26,383 
human genes. Each 
slice lists the num- 
bers and percentages 
(in parentheses) of 
human gene functions 
assigned to a given 
category of molecular 
function. The outer cir- 
cle shows the assign- 
. ment to molecular 
function categories in 
the Gene " Ontology 
(GO) (779), and the 
inner circle shows 
the assignment to 
Celera's Panther mo- 
lecular function cate- 
gories (776). 



figasc<56,0.2%) 
isomerasc(163,0J%) 
hydrolase (1227, 4.0%) 



^^GO categories 



^molecular function unknown ( 1 2809, 4 1 .7%) 



Panther categories 
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7.2 Evolutionary conservation of core 
processes 

Because of the various "model organism" 
genome-sequencing projects that have al- 
ready been completed, reasonable compara- 
tive information is available for beginning the 
analysis of the evolution of the human ge- 
nome. The genomes of S. cereyisiae ■ ("bak- 
ers' yeast") (118) and two diverse inverte-/ 
brates, C. elegans (a nematode worm) (119) 
and D. melanogaster (fly) (26), as well as the . 
first plant genome, A, thaliana, recently com- 
pleted (92), provide a diverse background for 
genome comparisons. 

We enumerated the "strict orthologs" con- 
served between human and fly, and between . 
human and worm (Fig. 16) to, address the 
question, What are the core functions that, 
appear to be common across the animals? 
The concept of orthology is important be- 
cause if two genes are orthologs, they can be 
traced by descent to the common ancestor of 
the two organisms (an "evolutionarily con- 
served protein set"), and therefore are likely 
to perform similar conserved functions in the ^ 
different organisms. It is critical in this anal- 
ysis to separate orthologs (a gene that appears 
in two organisms by descent from a common 
ancestor) from paralogs (a gene that appears 
in more than one copy in a given organism by 
a duplication event) because paralogs may 
subsequently diverge in function. Following 
the yeast-wprm ortholog. comparison in 
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, (120), we identified two different cases for 
each pairwise comparison (human-fly and 
human-worm). The first case was a pair of 
genes, one from each organism, for which 
.there was no other close homolog in either 
organism. These are straightforwardly identi- 
fied as prthologous, because there are no 
, additional members of the families that com- 
plicate, separating orthologs from paralogs. 
The second case is, a family of genes with 
more than one member in either or both of the 
organisms being compared. Chervitz et al. 
. (120) deal with this case by analyzing a 
phylogenetic tree that described the relation- 
ships between all of the sequences in both 
organisms, and then looked for pairs of genes 
. that were nearest neighbors in the tree. If the 
nearest-neighbor pairs were from different , 
organisms, those genes were presumed to be 
orthologs. We note that these nearest neigh- 
bors can often be confidently identified from , 
pairwise sequence comparison without hav-. 
ing to examine a phylogenetic tree (see leg- 
end to Fig. 16). If the nearest neighbors are 
not from different organisms, there has been 
a paralogous expansion in one or both organ- 
isms after the speciation event (and/or a gene 
loss by one organism). When this one-to-one 
correspondence is lost, defining an ortholog 
becomes ambiguous. For our initial compu- 
tational overview of the predicted human pro- 
tein set, we could not answer this question for 
every predicted protein. Therefore, we con- » 



fig. 16. Functions of putative 
orthologs across vertebrate 
and invertebrate genomes. 
Each slice lists the number and 
percentages (in parentheses) 
of "strict orthologs" between 
the human, fly, and worm ge- 
nomes involved in a given cat- 
egory of molecular function. 
"Strict orthologs" are defined 
here as bi-directional BLAST 
best hits (780) such that each 
orthologous pair (i) has a 
BLASTP lvalue of ^10~ 10 
(720), and (ii) has a more sig- 
nificant BLASTP score than 
any paralogs in either organ- 
ism, i.e., there has likely been 
no duplication subsequent to 
speciation that might make 
the orthology ambiguous. This 
measure is quite strict and is a 
lower bound on the number of . 
orthologs. By these criteria, 
there are 2758 strict human- 
fly orthologs, and 2031 hu- 
man-worm orthologs (1523 in 
common between these sets). 



cytosiccleta! structural protein (20, 1.2%) 
. chaperonc(I6,0.9%\ 
cell adhesion (11, 0.6%) v 
miscellaneous (72, 42%) x 
viral protein (4, 0.2%) x 
. transfer/carrier protein (11, 0.6%) * 

transcription fedor (8 1,4.7%) . 



nucleic acid enzyme (221 , 12.9%) 



receptor (23, 13%) 



kinase (69, 4.0%) 



select regulatory molecule (88, 5. 1%) 



transferase (70, 4.1%) 




> - sider only, "strict orthologs," i.e., the proteins 

• with unambiguous one-to-one relationships 
(Fig. 16).. By these criteria, there are 2758 

• strict human-fly orthologs, 2031 human- 
worm (1523 in common between these sets). 

. ■ .; We define the evolutionarily conserved set as 
those 1523 human proteins that have strict 

- ■orthologs in. both :>D.Kmeidnogaster and C. 
elegans. . 1 

, . ; ,:The. distribution of the, functions of the 
conserved protein set is shown in Fig. 16. 
Comparison with Fig. ,15 shows that, not 
surprisingly, the set of conserved proteins is 
not distributed among molecular functions in 
the same way as the whole human protein set. 
Compared with the .whole human set (Fig. 
15), there are several categories that are over- 
represented in the conserved set by a factor of 
—2 or more. The. first category is nucleic acid 

- enzymes, primarily the transcriptional ma- 
chinery (notably - DNA/RNA methyltrans- 
ferases, .DNA/RNA. polymerases, helicases, 

, DNA ligases, DNA- . and RNA-processing 
factors, nucleases, and ribosomal proteins). 
The basic transcriptional and translational 
machinery is well known to have been con- 
served over evolution, from bacteria through 
to the most complex eukaryotes. Many ribo- 
nucleoproteins involved in RNA splicing also 
appear to be conserved among the animals. 
Other enzyme types are also overrepresent- . 
ed . (transferases,- oxidoreductases, ligases, 
lyases,* and isomerases). Many of these en- 



cxtraccliular matrix (12, 0.7%) 
jon channel (7, 0.4%) 
motor (1 3, 0.8%) 

structural protein of muscle (8, 0.5%) 
protooncogenc (23, 13%) 

intracellular transporter (51, 3.0%) 



transporter (44, 2.6%) 



36 



synthase and synthetase (64, 3.7%) * 

oxidoreductasc (64, 3.7%) 

r/ase(I2,0.7%)' / i hydrolase (80. 4.7%) 
ligasc (9, 0.5%) isomcrasc (21,1 .2%) 
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zymes are involved in intermediary metabo- 
lism. The only exception is the hydrolase 
category, which is not significantly overrep- 
resented in the shared protein set. Proteases 
form the largest part of this category, and 
several large protease families have expanded 
in each of these three organisms after their 
divergence. The category of select regulatory 
molecules is also overrepresented in the con- 
served set. The major conserved families are 
small guanosine triphosphatases (GTPases) 
(especially the Ras-related superfamily, in- 
cluding ADP ribosylation factor) and cell 
cycle regulators (particularly the cullin fam- 
ily, cyclin C family, and several cell division 
protein kinases). The last two significantly 
overrepresented categories are protein trans- 
port and trafficking, and chaperones. The . 
most conserved groups in these categories are 
proteins involved in coated vesicle-mediated 
transport, and chaperones involved in protein 
folding and heat-shock response [particularly 
the DNAJ family, and heat-shock protein 
60 (HSP60), HSP70, and HSP90 families].. 
These observations provide only a conserva- 
tive estimate of the protein families in the 
context of specific cellular processes that 
were likely derived from the last common 
ancestor of the human, fly, and worm. As 
stated before, this analysis does not provide a 
complete estimate of conservation across the 
three animal genomes, as paralogous dupli- 
cation makes the determination of true or- 
thologs difficult within the members of con- . 
served protein families. 



7.3 Differences between the human 
genome and other sequenced 
eukaryotic genomes 

To explore the molecular building blocks of 
the vertebrate taxon, we have compared the 
human genome with the other sequenced 
eukaryotic genomes at three levels: molec- 
ular functions, protein families, and protein 
domains. 

Molecular differences can be correlated 
with phenotypic differences to begin to reveal 
the developmental and cellular processes that 
are unique to the vertebrates. Tables 18 and 
19 display a comparison among all sequenced 
eukaryotic. genomes, over selected protein/ 
domain families (defined by sequence simi- 
larity, e.g., the serine-threonine protein ki- 
nases) and superfamilies (defined by shared 
molecular function, which may include sev- 
eral sequence-related families, e.g., the cyto- 
kines). In these tables we have focused on 
(super) families that are either very large or 
that differ significantly in humans compared 
with the other sequenced eukaryote genomes. 
We have found that the most prominent hu- 
man expansions are in proteins involved in (i) 
acquired immune functions; (ii) neural devel- 
opment, structure, and functions; (iii) inter- 
cellular and intracellular signaling pathways 



in development and homeostasis; (iv) tiemo- 
stasis; and (v) apoptosis. 

Acquired immunity. One of the most 
striking differences .between the human ge- 
nome and the Drosophila or C. elegans ge- 
; nome is the appearance of genes involved in 
acquired immunity (Tables 18 and 19). This 
is expected, because the acquired immune 
response is a defense system that only occurs 
in vertebrates. We observe 22 , class I and 22 
class; II ^majorcm'stocorripatibility .complex- 
(MHC) antigen; genes and 1 14. other Jirimu- 
noglobulin genes in the ;human, genome. In. 
addition, there are 59 genes in the cognate 
immunoglobulin receptor family. At the do- 
main level, this is exemplified by an expan- 
sion and recruitment of the. ancient immuno- 
globulin fold to constitute molecules such as 
MHC, and of the integrin fold to form several 
of the cell adhesion molecules that mediate . 
interactions between immune effector cells 
. and the extracellular matrix. Vertebrate-spe- 
cific proteins include the paracrine immune 
regulators family of secreted 4-alpha helical 
bundle proteins, namely the cytokines . and 
chemokines. Some of the cytoplasmic signal ■ 
transduction components associated with cy- 
tokine receptor signal,. transduction : are also 
features that are poorly represented in the fly 
and worm. These include protein domains 
found in the signal transducer and activator of 
transcription (STATs), the suppressors of cy- 
tokine signaling (SOCS), and protein inhibi- 
tors of activated STATs (PIAS). In contrast, : 
many of the animal-specific protein domains 
that play a role in innate immune response, 
such as the Toll receptors, do not appear to be 
significantly expanded in the human genome. 

Neural development, structure, and 
function. In the human genome, as compared 
with the worm and fly genomes, there is a 
marked increase in the number of members 
of protein families .that .are involved in 
neural development. Examples include neu- 
rotrophic factors such as ependymin, nerve 
growth factor, and signaling molecules 
such as semaphorins, as well as the number 
of proteins involved directly in neural 
structure and function such as myelin pro- 
teins, voltage-gated ion channels, and syn- 
aptic proteins such as synaptotagmin. 
These observations correlate well with the 
known phenotypic differences between the 
nervous systems of these taxa, notably (i) 
the increase in the number and connectivity 
of neurons; (ii) the increase in number of 
distinct neural cell types (as many as a 
thousand or more in human compared with 
a few hundred in fly and worm) (121); (iii) 
the increased length of individual axons; 
and (iv) the significant increase in glial cell 
number, especially the appearance of my- 
elinating glial cells, which are electrically 
inert supporting cells differentiated from 
the same stem cells as neurons. A number 



of prominent protein expansions are in- 
volved in the processes of neural develop- 
ment. Of the extracellular domains that me- 
diate cell adhesion, the connexin domain- 
containing proteins (122) exist only in hu- 
■ mans. These proteins, which are not present 
in the Drosophila or C. elegans genomes, 
appear to provide the constitutive subunits 
. ■ of intercellular channels and the structural 
basis for electrical coupling.- Pathway find-* 
•V* ing by. axons and neuronal network forma- 
. . tion is mediated through a:subset of ephrins 
. and their cognate receptor tyrosine kinases 
that act as positional labels to establish 
topographical projections (/ 23). The prob- 
able biological role for the semaphorins (22 
- in human compared with 6 in the fly and 2 
< in the worm) and their receptors (neuropi- 
, lins and plexins) is that of axonal guidance 
molecules (124). Signaling molecules such 
as neurotrophic factors and some cytokines 
have been shown to regulate neuronal cell 
survival, proliferation, and axon guidance 
(125). Notch receptors and ligands play 
r important roles in glial cell fate determina- 
tion and gliogenesis (126). 

Other human expanded gene families play 
: key roles directly . in ■ neural structure and 
function. One example is synaptotagmin (ex- 
panded more than twofold in humans relative 
to the invertebrates), originally found to reg- 
ulate synaptic transmission by serving as a 
Ca 2+ sensor (or receptor) during synaptic 
vesicle fusion and release (127). Of interest is 
the increased co-occurrence in humans of 
PDZ and the SH3 domains in neuronal- 
specific adaptor molecules; examples include 
proteins that likely modulate channel activity 
at synaptic junctions (128). . We also noted 
expansions in several ion-channel families 
(Table 19), including the EAG subfamily 
: (related to cyclic nucleotide gated channels), 
; the voltage-gated . xalcium/sodium channel 
family, : the inward-rectifier potassium chan- 
nel family, and the. voltage-gated potassium 
channel, alpha subunit family. Voltage-gated 
sodium and potassium channels are involved 
in the generation of action potentials in neu- 
rons. Together with voltage-gated calcium 
channels, they also play a key role "in cou- 
pling action potentials to neurotransmitter re- 
lease, in the development of neurites, and in 
short-term memory. The recent observation 
of a calcium-regulated association between 
sodium channels and synaptotagmin may 
have consequences for the establishment and 
regulation of neuronal excitability (129). 

Myelin basic protein and myelin-associat- 
ed glycoprotein are major classes of protein 
components in both the central and peripheral 
nervous system of vertebrates. Myelin P0 is a 
major component of peripheral myelin, and 
myelin proteolipid and myelin oligodendro- 
cyte glycopotein are found in the central 
nervous system. Mutations in any of these 
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Table 18. Domain-based comparative analysis of proteins in H. sapiens (H), 
D. metanogaster (F), C etegans (W), 5. cerevisiae (Y), and 4. thatiana (A). The 
predicted protein set of each of the above eukaryotic organisms was analyzed 
with Pfam version 5.5 using E value cutoffs of 0.001. The number of proteins 
containing the specified Pfam domains as well as the total number of domains 
(in parentheses) are shown in each column. Domains were categorized into 
cellular processes for presentation. Some domains (i.e., SH2) are listed in 



more than one cellular process. Results of the Pfam analysis may differ from 
- results obtained based on human curation of protein families, owing to the 
limitations of large-scale automatic classifications. Representative examples 
of domains with reduced counts owing to the stringent E value cutoff used for 
this analysis are marked with a double asterisk (**). Examples include short 
.divergent and predominantly alpha-helical domains, and certain classes of 
cysteine-rtch zinc finger proteins. 



Accession 
number 



Domain name 



Domain description. 



H 



W 



PF02039 
PF00212 
PFO0O28 
PF00214 
PF01110 
PF01093 
PF00029 
PF00976 
PF00473 
PF00007 
PF00778 
PF00322 
PF00812 
PF01404 
PF00167 
PF01534 
PF00236 
PF01153 
PF01271 
PF02058 
PF00049 
PF00219 
PF02024 
PF00193 
PF00243 
PF02158 
PF00184 
PF02070 
PF00066 
PF00865 
PF00159 
PF01279 
PF00123 
PF00341 
PF01403 
PF01033 
PF00103 
PF02208 
PF02404 
PF01034 
PF00020 
PF00019 
PF01099 
PF01160 
PF00110 

PF01821 

PF00386 

PF00200 

PF00754 

PF01410 
, PF00039 

PF00040 
PF00051 
PF01823 
PF00354 
PF00277 
PF00084 
PF02210 
PF01108 
PF00868 
PF00927 



Adrenomedullin 
ANP 
Cadherin 
Calc.CGRPJAPP 
CNTF 
:.austerin . 
Connexin 
ACTH.domain 
CRF 

Cys_knot 
DIX 

Endothelin 
Ephrin 
EPhJbd 
FCF 
Frizzled 
Hormone6 
Clypican 
Cranin 
Cuanylin 
Insulin 
ICFBP 
Leptin 
Xlink 
NCF 

Neureguiin . 
HormoneS 
NMU 
Notch 

Osteopontin 
Hormone3 
Parathyroid 
Hormone2 
PDCF 
Sema 

Somatomedin_B 
Hormone 
Sorb 
SCF 

Syndecan 
TNFR c6 
TGF-0 
Uteroglobin 
Opiods_neuropep 
Wnt 

ANATO 
C1q 

Disintegrin 
F5_F8_type C 
COLFI 
Fnl 
Fn2 
Kringle 
MACPF 
Pentaxin 
SAA_proteins 
Sushi 
TSPN 
Tissue_fac 
TransglutaminJM 
Transglutamin.C 



- - - Developmental and homeostatic 
Adrenomedullin 
Atrial natriuretic peptide 
Cadherin domain 
Calcitonin/CGRP/IAPP family 
Ciliary neurotrophic factor 
Clusterin 
Connexin 

Corticotropin ACTH domain 

Corticotropin-releasing factor family 

Cystine-knot domain 

Dix domain 

Endothelin family 

Ephrin 

Ephrin receptor ligand binding domain 
Fibroblast growth factor 
Frizzled/Smoothened family membrane region 
Glycoprotein hormones 
Clypican 

Crainin (chromogranin or secretogranin) 

Cuanylin precursor 

Insulin/ICF/Relaxin family 

Insulin-like growth factor binding proteins 

Leptin 

LINK (hyaluron binding) 
Nerve growth factor family 
Neureguiin family 
Neurohypophysial hormones 
Neuromedin U 
Notch (DSL) domain 
Osteopontin 

Pancreatic hormone peptides 
Parathyroid hormone family 
Peptide hormone 

Platelet-derived growth factor (PDCF) 
Sema domain 
Somatomedin B domain 
Somatotropin 

Sorbin homologous domain 

Stem cell factor 

Syndecan domain 

TNFR/NGFR cysteine-rich region 

Transforming growth factor 0-like domain 

Uteroglobin family 

Vertebrate endogenous opioids neuropeptide 
Wnt family of developmental signaling proteins 

Hemostasis 

Anaphylotoxin-like domain 

Clq domain 

Disintegrin 

F5/8 type C domain 

Fibrillar collagen C-terminal domain 

Fibronectiri type I domain V 

Fibronectin type II domain 

Kringle domain 

MAC/Perforin domain 

Pentaxin family 

Serum amyloid A protein 

Sushi domain (SCR repeat) 

Thrombospondin N-terminaWike domains 

Tissue factor 

Transglutaminase family 

Transglutaminase family 



regulators 

1 
2 

100(550) 
3 
1 
3 

.". 14(16) 
1 
2 

10(11) 
5 
3 

7(8) 
12 
23 
9 
1 
14 
3 
1 
7 
10 

13(23) 
3 
4 

1 . 
1 

3(5) 
1 
3 

5(9) 
5 

27(29) 
5(8) 
1 
2 
2 
3 

17(31) 
27(28) 
3 
3 
18 



0 
0 

14(157) 
0 
0 
0 
0 
0 
1 
2 
2 
0 
2 
2 
1 
7 
0 
2 
0 
0 
4 
0 
0 
0 
0 
0 
0 
0 

2(4) 
0 
0 
0 
0 

1 

8(10) 
3 
0 
0 
0 

1 
1 

6 
0 
0 

7(10) 



0 
0 

16(66) 
0 
0 
0 
0 
0 
0 
0 
4 
0 
4 
1 
1 
3 
0 
1 
0 
0 
0 
0 
0 

1 

0 
0 
0 
0 

2(6) 
0 
0 
0 
0 
0 

3(4) 
0 
0 
0 
0 

1 

0 
4 
0 
0 
5 



0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 
0 
0 
0 



6(14) 


0 


0 


0 


24 


0 


0 


0 


18 


2 


3 


0 


15(20) 


5(6) 


2 


0 


. 10 


0 , 


0 * 


0 


5(18) , 


0 


• ; 0 


0 


11(16) 


0 


0 


0 


15(24) 


2 


2 


0 


6 


0 


0 


0 


9 


0 


0 


0 


4 


0 


0 


0 


53(191) 


11(42) 


8(45) 


0 


14 


1 


0 


0 


1 


0 


0 


0 


6 


1 


0 


0 


8 


1 


0 


0 



0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
. 0 

b 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 
0 
0 
0 

o 

0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
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Table 18 {Continued) 



Accession 
number 



Domain name 



Domain description 



W 



PF00594 



PF00711 
PF00748 
PF00666 
PF00129 

PF00993 
PF00969 
PF00879 
PF01109 



Cla 



Defensin.beta 
Calpainjnhib 
Catheliddins 
MHCJ 

MHCJLalpha** 
MHCJLbeta** 
Defensin_propep 
GM.CSF 



PF00047 


la 


PF00143 


Interferon 


PF00714 


IFN-gamma 


PF00726 


IL10 


PF02372 


IL15 


PF00715 


IL2 


PF00727 


IL4 


PF02025 


IL5 


PF01415 


IL7 


PF00340 


IL1 


PF02394 


IL1_propep 


PF02059 


113 


PF00489 


IL6 


PF01291 


LIF.OSM 


PJ-003Z3 


Defensins 


PF01091 


PTN.MK 


PF00277 


SAA_proteins 


PF00048 


IL8 


PF01582 


TIR 


: PF00229 


TNF 


PrOOOoo 


Trefoil 


PF00779 


BTK 


PF00168 


C2 


PF00609 


DAGKa 


PF00781 


DACKc 


PF00610 


DEP 


PF01363 


FYVE 


PF00996 


GDI 


PF00503 


w aiL/iia 


PF00631 




PF00616 


RasCAP 


PF00618 


RasCEFN 


PF00625 


Guanylatejcin 


PF02189 


ITAM 


PF00169 


PH 


PF00130 


DAG.PE-bind 


PF00388 


PI-PLC-X 


PF00387 


PI-PLC-Y 


PF00640 


PID 


PF02192 


PI3KLp85B 


PF00794 


PI3K_rbd 


PF01412 


ArfGAP 


PF02196 


RBD 


PF02145 


Rap.CAP 


PF00788 


RA 


PF00071 


Ras 


PF0O617 


RasCEF 


PF00615 


RGS 


PF02197 


Rlla 



Vitamin K-dependent carboxylation/gamma- 
carboxyglutamic (CIA) domain 

immune response 

Beta defensin 

Calpain inhibitor repeat 

Catheliddins . 

Class I histocompatibility antigen, domains alpha 1 
and 2 

Class II histocompatibility antigen, alpha domain 
Class II histocompatibility antigen, beta domain 
Defensin propeptide 

Granulocyte-macrophage colony-stimulating factor 

Immunoglobulin domain 

Interferon alpha/beta domain 

Interferon gamma 

lnterleukin-10 

lnterleukin-1 5 

lnterleukin-2 

lnterleukin-4 

lnterleukin-5 

lnterleukin-7/9 family 

lnterleukin-1 

lnterleukin-1 propeptide 

lnterleukin-3 

lnterleukin-6/G-CSF/MGF family 

Leukemia inhibitory factor (LIF)/oncostatin (OSM) 

family 
Mammalian defensin 
PTN/MK heparin-binding protein 
Serum amyloid A protein 
Small cytokines (intecrine/chemokine), 

interleukin-8 like 
TIR domain 

TNF (tumor necrosis factor) family . 
Trefoil (P-type) domain 

PI-PY-rho CTPase signaling 

BTK motif 
C2 domain 

Diacylglycerol kinase accessory domain (presumed) 
Diacylglycerol kinase catalytic domain (presumed) 
Domain found in Dishevelled, Egl-10, and 

Pleckstrin (DEP) 
FYVE zinc finger 
GDP dissociation inhibitor 
G-protein alpha subunit 
G-protein gamma like domains 
GTPase-activator protein for Ras-like GTPase 
Guanine nucleotide exchange factor for Ras-like 

GTPases; N-terminal motif 
Guanylate kinase 

Immunoreceptor tyrosine-based activation motif 
PH domain 

Phorbol esters/diacylglycerol binding domain (CI 
domain) 

Phosphatidylinositol-specific phospholipase C, X 
domain 

Phosphatidylinositol-specific phospholipase C, Y 
domain 

Phosphotyrosine interaction domain (PTB/PID) 
PI3-kinase family, p85-binding domain 
PI3-kinase family, ras-binding domain 
Putative GTP-ase activating protein for Arf 
Raf-like Ras-binding domain 
Rap/ran-GAP 

Ras association (RalGDS/AF-6) domain 
Ras family 
RasGEF domain 

Regulator of G protein signaling domain 
Regulatory subunit of type II PKA R-subunit 



11 



3(9) 
2 

18(20) 

5(6) 
7 
3 
1 

381 (930J 
7(9) 
1 
1 
1 
1 
1 
1 
1 
7 
1 
1 
2 
2 



0 
0 
0 

0 

0 
0 
0 
0 

125 (291) 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 



0 



0 
0 
0 
0 

0 
0 
0 
0 

67(323) 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 



0 
0 
0 

o 

0 
0 
0 
0 
0 
0 
0 
0 
0 
0 

0 

0 

0 

0 

0 

0 

0 

0 



0 
0 
0 

.0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 

0 



z 


0 


0 


0 


0 


c. 


0 


0 


0 


0 


A 


0 


0 


0 


0 


DC. 


0 


0 


0 


0 


18 


8 


c 


u 


131 1143; 


12 


0 


0 


b 


0 


5(6) 


0 


2 


0 


0 


5 


1 


0 


0 


0 


73(101) 


32 (44) 


24(35) 


6(9) 


66 (90) 


9 


4 


7 


0 


6 


10 


8 


8 


2 


11(12) 


12(13) 


4 


10 


5 


2 


28(30) 


14 


15 


5 


15 


6 


2 


1 


1 


3 


27(30) 


.10 


20(23) 


2 


5 


16 


5 


5 


1 


0 


11 


5 


8 


3 


0 


9 


2 


3 


5 


0 


12 


8 


7 


1 


4 


3 


0 


0 


0 


0 


193(212) 


72(78) 


65(68) : 


24 


23 


45(56) 


25(31) 


26(40) 


1(2) - 


4 


12 


3 


7 


1 


8 


11 


2 


7 


1 


8 


24(27) 


13 


11(12) 


0 


0 


2 


1 


1 


0 


0 


6 


3 


1 


0 


0 


16 


9 


8 


6 


15 


6(7) 


4 


1 


0 


0 


5 


4 


2 


0 


0 


18(19) 


7(9) 


6 


1 


0 


126 


56(57) 


51 


23 


78 


21 


8 




5 


0 


27 


6(7) 


12(13) 


1 


0 


4 


1 


2 


1 


0 
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the Human genome 



Accession 
number 



Domain name 



Domain description 



W 



PF00620 
PF00621 
PF00536 
. PF01369 
PF00017 
PF00018 
PF01017 
PF00790 
PF00568 

PF00452 
PF02180 
PF00619 
PF00531 
PF01335 
PF02179 
PF00656 
PF00653 

PF00022 
PF00191 
PF00402 
PF00373 
PF00880 
• PF00681 
PF00435 
PF00418 
PF00992 
PF02209 
PF01044 



RhoCAP 

RhoGEF 

SAM 

Sec7 

SH2 

SH3 

STAT 

VHS 

WH1 

Bcl-2 
: » BH4 
CARD 
Death 
DED 
BAG 
ICE_p20 
BIR 

Actin 

Annexin 

Catponin 

Band_41 

Nebulin_repeat 

Plectin_repeat 

Spectrin 

Tubulin-binding 

Troponin 

VHP 

Vinculin 



PF01391 


Collagen 


PF01413 


C4 


PF00431 


CUB 


PF00008 


ECF 


PF00147 


Fibrinogen_C 


PF00041 


Fn3 


PF007S7 


Furin-like 


PF00357 


lntegrin_A 


PF00362 


Integrin.B 


PF00052 


Laminin.B 


PF00053 


Laminin_EGF 


PF00054 


Lamintn.G 


PF0005S 


Laminin_Nterm 


PF00059 


Lectin c 


PF01463 


LRRCT 


PF01462 


LRRNT 


PF00057 


LdLrecept_a 


PF00058 


Ldl_recept b 


PF00530 


SRCR 


PF00084 


Sushi 


PF00090 


Tsp_1 


PF00092 


Vwa 


PF00093 


Vwc 


PF00094 


Vwd 


PF00244 . 


14-3-3 


PF00023 


Ank 



PF00514 

PF00168 

PF00027 

PF01556 

PF00226 

PF00036 

PF00611 

PF01846 

PF00498 



ArmadiUo_seg 
C2 

cNMP binding 

DnaJ_C 

DnaJ 

Efhand** 

FCH 

FF 

FHA 



RhoGAP domain 
RhoGEF domain 

SAM domain (Sterile alpha motif) 
Sec7 domain 

Src homology 2 (SH2) domain 
Src homology 3 (SH3) domain 
STAT protein 
VHS domain 
WH1 domain 

Domains involved in apoptosis 

Bct-2 

Bci-2 homology region 4 
Caspase recruitment domain 
Death domain 
Death effector domain 
Domain present in Hsp70 regulators 
ICE-like protease (caspase) p20 domain 
Inhibitor of Apoptosis domain 

. Cytoskeletal 

Actin 
Annexin 
Calponin family 

FERM domain (Band 4.1 family) 
Nebulin repeat 
Plectin repeat 
Spectrin repeat 

Tau and MAP proteins, tubulin-binding 
Troponin 

Villin headpiece domain 
Vinculin family 

• f CM adhesion 
Collagen triple helix repeat (20 copies) 
C-terminal tandem repeated domain in type 4 

procollagen 
CUB domain 
EGF-like domain 

Fibrinogen beta and gamma chains, C-terminal 
globular domain 

Fibronectin type III domain 

Furin-like cysteine rich region 

Integrin alpha cytoplasmic region 

Integrins, beta chain 

Laminin B (Domain IV) 

Laminin EGF-like (Domains III and V) 

Laminin G domain 

Laminin N-terminal (Domain VI) 

Lectin C-type domain 

Leucine rich repeat C-terminal domain 
Leucine rich repeat N-terminal domain 
Low-density lipoprotein receptor domain class A 
Low-density lipoprotein receptor repeat class B 
Scavenger receptor cysteine-rich domain 
Sushi domain (SCR repeat) 
Thrombospondin type 1 domain 
von Willebrand factor type A domain 
von Willebrand factor type C domain 
. von y/illebrand factor type D. domain 

Protein interaction domains 

14-3-3 proteins 
Ank repeat 

ArmadilU^eta-catenin-tike repeats 
C2 domain . 

Cyclic nudeotide-binding domain 
DnaJ C terminal region 
DnaJ domain 
EF hand 

Fes/CIP4 homology domain 
FF domain 
FHA domain 



59 
46 
29(31) 
13 

. 87(95) 
143(182) 
7 
4 
7 

9 
3 
16 
16 
4(5) 
5(8) 
11 
8(14) 

61 (64) 
16(55) 
13(22) 
29(30) 
4(148) 

2(11) 
31 (195) 
4(12) 
4 
5 
4 

65(279) 
6(11) 



47(69) 
108(420) 
26 

106 (545). 
5 
3 
8 

8(12) 
24(126) 
30(57) 
10 
47(76) 
69(81) 
40(44) 
35(127) 
15(96) 
11(46) 
53(191) 
41 (66) 
34(58) 
19(28) 
15(35) 



20 

145 (404) 
22(56) 
73(101) 
26(31) 
12 
44 

83 (151) 
9 

4(11) 
13 



19 

23(24) 
15 
5 

33(39) 
55(75) 
1 
2 
2 

2 
0 
0 
5 
0 
3 
7 

5(9) 

15(16) 
4(16) 
3 

17(19) 
1(2) 
0 

13(171) 
1(4) 
6 
2 
2 

10(46) 
2(4) 



9(47) 
45 (186) 
10(11) 

42(168) 
2 
1 
2 

4(7) 
9(62) 
18(42) 
6 

23(24) 
23(30) 
7(13) 
33(152) 
9(56) 
4(8) 
11(42) 
11(23) 

0 . 
6(11) 
3(7) 

3 

72 (269) 
11(38) 
32 (44) 
21 (33) 
9 
34 

64(117) 
3 

4(10) 

. 15 



20 
18(19) 
8 
5 

44(48) 
46(61) 
1(2) 
4 

2(3) 



9 
3 
3 
5 
1 

23(27) 
0 
4 
1 



8 
0 
6 
9 
3 
4 
0 
8 
0 



1 


A 

V 


u 


1 


0 


0 


2 


0 


0 


7 


0 


0 


0 


0 


0 


2 


1 


5 


3 


0 


0 


2(3) 


K2) 


0 


12 


9(11) 


24 


4(11) 


0 


6(16) 


7(19) 


0 


0 


11(14) 


0 


0 


1 


0 


0 


0 


0 


0 


10(93) 


0 


0 


2(8) 


0 


0 


8 


0 


0 


2 


0 


5 


1. " 


0 




174(384) 


. 0 


0 


3(6) 


0 


0 


43(67) 


0 


0 


54(157) 


0 


1 


6 


0 


0 



34(156) 


0 


1 


1 


0 


0 


2 


0 


0 


2 


0 


0 


6(10) 


0 


0 


11(65) 


0 


0 


14(26) 


0 


0 


4 


0 


0 


91 (132) 


0 


0 


7(9) 


0 


0 


3(6) 


0 


0 


27(113) 


0 


0 


7(22) 


0 


0 


1(2) 


0 


0 


8(45) 


0 


0 


18(47) 


0 


0 


17(19) 


0 


1 


2(5) 


0 


0 


:■ 9 " 


o 


o 


3 


2 


15 


75(223) 


12(20) 


66(111) 


3(11) 


2(10) 


25(67) 


24(35) 


6(9) 


66(90) 


15(20) 


2(3) 


22 


5 


3 


19 


33 


20 


93 


41 (86) 


4(11) 


120(328) 


2 


4 


0 


3(16) 


2(5) 


4(8) 


7 


13(14) 


17 
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myelin proteins result in severe demyelina- 
tion, which is a pathological condition in 
which the myelin is lost and the nerve con- 
duction is severely impaired (J 30). Humans 
have at least 10 genes belonging to four , 
different families involved in myelin produc- 

Table 18 (Continued) 



The Human Genome 



tioT^five myelin P0, three myelin proteolip- 
id, myelin basic protein, and myelin-oligo- 
dendrocyte glycoprotein, or MOG), and pos- 
sibly more-remotely. related members of the 
MOG family. Flies have only a single myelin „- 
proteolipid, and worms have none at all. 



Intercellular and intracellular signaling 
pathways in development and homeostasis. 
Many protein families that have expanded in 
humans relative to the invertebrates are in- 
volved in signaling processes, particularly in 
. response to development and differentiation 



Accession 
number . 



Domain name 



Domain description 



H 



W 



PF00254 


FKBP 


PF01590 


CAF 


PF01344 


Ketch 


r ruujw 


1 RD** 


PFAOQ17 










pr*7 


PFOOIfiQ 
rruv io3 


DM 

rn 


PF01535 


PDR** 


PF00536 


CAM 


PF01369 


Sec7 


PF0O017 


. SH2 


PF00018 


SH3 


PF01740 


CTAC 
J 1 /U 


PF00S15 


TPR** 


r r uvtvv 




PF00397 


\AAA/ 
w w 




77 


PF01754 


Zf-A20 


DCSM 300 

rrO 1388 


ARID 


PF01426 


BAH 




Zt-B_DOX" 


PF00533 


BRCT 


PF00439 


Bromodomain 


rrUUobl 


BTB 


DCAA1>tC 

rrUU14b 


DNAjnetnylase 


PrCX}385 


Chromo 


PF00125 


Histone 


PF00134 


Cyclin 


PF00270 


DEAD 


PF01529 


Zf-DHHC 


PF00646 


F-box** 


PF00250 


Forehead 


PF00320 


CATA 


PF01585 


G-patch 


PF00010 


HLH** 


PF0O850 


Hist.deacetyl 


PF0O046 


Homeobox 


PF01833 


TIG 


PF02373 


JmjC 


PF02375 


JmjN 


PF00013 


KH-domain 


PF01352 


KRAB 


PF00104 


Hormone_rec 



PF0O412 
PF00917 
PF00249 
PF02344 
PF01753 
PF00628 
PF0O157 
PF02257 
PF00076 

PF02037 
PF00622 
PF01852 
PF00907 



LIM 
MATH 

Myb.DNA-binding 

Myc-LZ 

Zf-MYND 

PHD 

Pou 

RFX_DNAJ>inding 
Rrm 

SAP 
SPRY 
START 
T-box 



FKBP-type peptidyl-prolyl cis-trans isomerases 

GAF domain 

Kelch motif 

Leucine Rich Repeat 

MATH domain 

PAS domain 

PDZ domain (Also known as DHR or GLGF) 
PH domain 
PPR repeat 

SAM domain (Sterile alpha motif) 
Sec7 domain 

Src homology 2 (SH2) domain 
Src homology 3 (SH3) domain 
STAS domain 
TPR domain 
WD40 domain 
WW domain 

ZZ-Zinc finger present in dystrophin, CBP/p300 

Nuclear interaction domains 

A20-like zinc finger 
ARID DNA binding domain 
BAH domain 
B-box zinc finger 

BRCA1 C Terminus (BRCT) domain 
Bromodomain 
BTB/POZ domain 

C-5 cytosine-spedfic DNA methylase 
chromo' (CHRromatin Organization Modifier) 
domain 

Core histone H2A/H2B/H3/H4 
Cyclin 

DEAD/DEAH box helicase 
DHHC zinc finger domain 
F-box domain 
Fork head domain 
GATA zinc finger 
G-patch domain 

Helix-loop-helix DNA-binding domain 
Histone deacetylase family 
Homeobox domain . 
IPT/TIC domain 
JmjC domain 
JmjN domain 
KH domain 
KRAB box 

Ligand-binding domain of nuclear hormone 

receptor 
LIM domain containing proteins 
MATH domain 

Myb-like DNA-binding domain 
Myc leucine zipper domain 
MYND finger 
PHD-finger • 

Pou domain— N-terminal to homeobox domain 
RFX DNA-binding domain 
RNA recognition motif (a.k.a. RRM, RBD, or RNP 

domain) 
SAP domain 
SPRY domain 
START domain 
T-box 



15(20) 
7(8) 
54(157) 
25(30) 
11 

18(19) 
96(154) 
193 (212) 
5 

29(31) 
13 

87(95) 
143 (182) 
5 

72(131) 
136 (305) 
32(53) 
10(11) 



7 fa\ 


7(13) 


4 


24 (29) 


2(4) 


1 


0 


10 


12(48) 


13(41) 


3 


102(178) 


24 (30) 


7(11) 


1 


15(16) 


5 


88(161) 


1 


61 (74) 


9(10) 


6 


1 


13 (18) 


60(87) 


46(66) 


2 


5 


72(78) 


65 (68) 


24 


23 


3(4) 


0 


1 


474 (2485) 


15 


8 


3 


6 


5 


5 


5 


9 


33(39) 


44(48) 




3 


55(75) 


46(61) 


23(27) 


4 


1 


6 


2 


13 


39 (101) 


28(54) 


16(31) 


65 (124) 


98 (226) 


72(153) 


56(121) 


167 (344) 


24(39) 


16 (24) 


5(8) 


11(15) 


13 


10 


2 


10 



2(8) 


2 


2 


0 


8 


11 


6 


4 


2 


7 


8(10) 


7(8) 


4(5) 


5 


21 (25) 


32(35) 


1 


2 


0 


0 


17(28) 


10(18) 


23(35) 


10(16) 


12(16) 


37(48) 


16(22) 


18(26) 


10(15) 


28 


97(98) 


62(64) 


86(91) 


1(2) 


30(31) 


3(4) 


1 


0 


0 


13(15) 


24(27) 


14(15) 


17(18) 


1(2) 


12 


75(81) 


5 


71 (73) 


8 


48 


19 


10 


10 


11 


35 


63 (66) 


48(50) 


55(57) 


50(52) 


84 (87) 


15 


20 


16 


7 


22 


16 


15 


309(324) 


9 


165(167) 


35(36) 


20(21) 


15 


4 


0 


11(17) 


5(6] 


8(10) 


9 


26 


18 


16 


13 


4 


14(15) 


60(61) 


44 


24 


4 


39 


12 


5(6) 


8(10) 


5 


10 


160(178) 


100(103) 


82 (84) 


6 


66 


29(53) 


11(13) 


5(7) 


2 


1 


10 


4 


6 


4 


7 


7 


4 


2 


3 


7 


28(67) 


14(32) 


17(46) 


4(14) 


27(61) 


204(243) 


0 


0 


0 


0 


47 


17 


142(147) 


0 


0 



62 (129) 


33(83) 


33(79) 


4(7) 


10(16) 


11 


5 


88(161) 


1 


61 (74) 


32(43) 


18(24) 


17(24) 


15(20) 


243(401) 


1 


0 


0 


0 


0 


14 


14 


9 


1 


7 


68(86) 


40(53) 


32(44) 


14(15) 


96 (105) 


15 


5 


4 


0 


0 


7 


2 


1 


1 


0 


224(324) 


127(199) 


94(145) 


43(73) 


232 (369) 


15 


8 


5 


5 


6(7) 


44(51) 


10(12) 


5(7) 


3 


6 


10 


2 


6 


0 


23 


17(19) 


8 


22 


0 


0 
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Table 18 (Continued) 



Accession 
number 



Domain name 



Domain description 



H 



PF02135 
PF01285 
PF02176 
PF0O352 

PF00567 
PF00642 
PF00096 
PF00097 
PF00098 



Zf-TAZ 
TEA 

Zf-TRAF 

TBP. 

TUDOR 

Zf-CCCH 

Zf-C2H2** 

Zf-C3HC4 

Zf-CCHC 



TAZ finger . 

TEA domain 

TRAF-type zinc finger 

Transcription factor TFIID (or TATA-binding 

protein; TBP) 
TUDOR domain 

Zinc finger C-x8-C-x5-C-x3-H type (and similar) 

Zinc finger, C2H2 type 

Zinc finger, C3HC4 type (RING finger) 

Zinc knuckle 



2(3) 
4 

6(9) 
2(4) 



1(2) 
1 

1(3) 
4(8) 



9(24) 9(19) 

17(22) 6(8) 

564(4500) 234(771) 

135(137) 57 

9(17) 6(10) 



w 


Y 


6(7) 


0 


1 


1 


1 


0 


2(4) 


1(2) 
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(Tables 18 . and 19). They include secreted 
hormones and growth factors, receptors, in- 
tracellular signaling molecules, and transcrip- 
tion factors. 

Developmental signaling molecules that are , 
• enriched in the human genome include growth 
factors such as wnt, transforming growth fac- 
tor-£ (TGF-p), fibroblast growth factor (FGF), 
nerve growth factor, platelet derived growth 
factor (PDGF), and ephrins. These growth fac- 
tors affect tissue differentiation and a wide 
range of cellular processes involving actin-cy- 
toskeletal and nuclear regulation. The corre- 
sponding receptors of these developmental Ii- 
gands are also expanded in humans. For exam- 
ple, our analysis suggests at least 8 human 
ephfmgenes(2mmefly,4mthewonn)and 12 
ephrin receptors (2 in the fly, 1 in the worm). In 
the wnt signaling pathway, we find 18 wnt 
family genes (6 in the fly, 5 in the worm) and 
12 frizzled receptors (6 in the fly, 5 in the 
worm). The Groucho family of transcriptional 
• corepressors downstream in the wnt pathway 
are even more markedly expanded, with 13 
predicted members in humans (2 in the fly, 1 in . 
the worm). 

Extracellular adhesion molecules involved 
in signaling are expanded in the human genome 
(Tables 18 and 19). The interactions of several 
of these adhesion domains with extracellular 
matrix proteoglycans play a critical role in host 
defense, morphogenesis, and tissue repair 
(757). Consistent with the well-defined role of 
heparan sulfate proteoglycans in modulating 
these interactions (752), we observe an expan- 
, sion of the heparin sulfate sulfotransferases in 
the human genome relative to worm and fly. 
These sulfotransferases modulate tissue differ- 
entiation (755). A similar expansion in humans 
is noted m structural proteins that constitute the . 
actm-cytoskeletal architecture: Compared with 
the fly and worm, we observe an explosive 
expansion of the nebulin (35 domains per pro- 
tein on average), aggrecan (12 domains per 
protein on average), and plectin (5 domains per 
protein on average) repeats in humans. These 
repeats are present in proteins involved in mod- 
ulating the actin-cytoskeleton with predominant 
expression in neuronal, muscle, and vascular 
tissues. 



, ... Comparison across the. five sequenced eu- 
, -karyotic organisms revealed several expand- 
ed protein families and domains involved in 
t cytoplasmic signal transduction (Table 18). 

In particular, signal, transduction .pathways 
-.- playing roles in developmental regulation and 
acquired immunity were substantially en- 
riched. There is a factor of 2 or greater ex- 
pansion in humans in the Ras superfamily 
.GTPases and the GTPase activator and GTP 
exchange -factors associated with them. Al- 
though there are about the same number of 
tyrosine kinases in the human and C. elegans 
genomes, in humans there is an increase in 
the SH2, PTB, and ITAM domains involved 
\ in phosphotyrosine signal transduction. Fur- 
ther, there is. a twofold expansion of phos- 
, phodiesterases in the human genome com- 
pared with either the worm or fly genomes. 
The downstream effectors of the intracellu- 
r lar signaling molecules include the transcription 
factors that transduce developmental fates. Sig- 
nificant expansions are noted in the ligand- 
, binding nuclear hormone receptor class oftran- 
. .scription factors compared with the fly genome, 
. although not to the extent observed in the worm 
(Tables 18 and 19). Perhaps the most striking 
expansion in humans is in the C2H2 zinc finger 
transcription factors. Pfam detects a total of 
4500 C2H2 zinc finger domains in 564 human 
proteins, compared with 771 in 234 fly proteins. 
This means that there has been a dramatic 
expansion not only in the number of C2H2 
transcription factors, but also in the number of 
these DNA-binding motifs per transcription 
factor (8 on average in humans, 3.3 on average 
in the fly, and 2.3 on average in the worm). 
Furthermore, many of these transcription fac- 
tors contain either the KRAB or SCAN, do- 
: mains, which are not found in the fly or worm 
genomes. These domains are involved in the 
oligomerization of transcription factors and in- 
crease the combinatorial partnering of these 
factors. In general, most of the transcription 
factor domains are shared between the three 
animal genomes, but the reassortment of these 
domains results in organism-specific transcrip- 
tion factor families. The domain combinations 
found in the human, fly, and worm include the 
BTB with C2H2 in the fly and humans, and 



homeodomains alone or in combination with 
Pou and LIM domains in all of the animal 
genomes. In plants, however, a different set of 
transcription factors are expanded, namely, the 
• myb family, and a unique set that includes VP1 
- and AP2 domain^ntaining proteins (754). 
..The yeast genome has a paucity of transcription 
factors compared with the multicellular eu- 
karyotes, and its repertoire is limited to the 
expansion of the yeast-specific C6 transcription 
factor family involved in metabolic regulation! 

While we have illustrated expansions in a 
subset of signal transduction molecules in the 
human genome compared with the other eu- 
karyotic , genomes, it should be noted that 
most of the protein domains are highly con- 
served. An interesting observation is that 
.worms and humans have approximately the 
same number of both tyrosine kinases and 
serine/threonine kinases (Table 19). It is im- 
portant to note, however, that these are mere- 
ly counts of the catalytic domain; the proteins 
■ that contain these domains also display a 
wide repertoire of interaction domains with 
* . significant combinatorial diversity. 
; ^Hemostasis. . HemostasK is regulated pri- 
marily by plasma proteases of the coagulation 
pathway and by the interactions that occur be- 
tween the vascular endothelium and platelets. 
Consistent with known anatomical and physio- 
logical differences between vertebrates and in- 
vertebrates, extracellular adhesion domains that 
constitute proteins integral to hemostasis are 
expanded in the human relative to the fly and 
worm (Tables 18 and 19). We note the evolu- 
tion of domains such as FIMAC, FN1, FN2, 
and Clq that mediate surface interactions be- 
tween hematopoeitic cells and the vascular ma- 
trix. In addition, there has been extensive re- . 
cruitment of more-ancient animal-specific do- 
mains such as VWA, VWC, VWD, kringle, 
and FN3 into multidomain proteins that are 
involved in hemostatic regulation. Although we 
do not find a large expansion in the total num- 
ber of serine proteases, this enzymatic domain 
has been specifically recruited into several of 
these multidomain proteins for proteolytic reg- 
ulation in the vascular compartment These are 
represented in plasma proteins that belong to 
the kinin and complement pathways. There is a 
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significant expansion in two families of matrix 
metailoproteases: ADAM (a disintegrin and 
metalloprotease) and MMPs (matrix metailo- 
proteases) (Table 19). Proteolysis of extracel- 
lular matrix (ECM) proteins is critical for tissue 
development and for tissue degradation in dis- 
eases such as cancer/arthritis, Alzheimer's dis- 
ease, and a variety of inflammatory conditions 
(135, 136). ADAMs are a family of integral 
membrane proteins with a pivotal role in fibrin- 
ogenolysis and modulating interactions be- 
tween hematopoietic components and the 
vascular matrix components. These proteins 
have been shown to cleave matrix proteins, 
and even signaling molecules: ADAM-17 
converts tumor necrosis factor-a, and 
ADAM- 10 has been implicated in the Notch 
signaling pathway (135). We have identified 
19 members of the matrix metalloprotease 
family, and a total of 51 members of the 
ADAM and ADAM-TS families. 

Apoptosis. Evolutionary conservation of 
some of the apoptotic pathway components 
across eukarya is consistent with its central 
role in developmental regulation and as a 
response to pathogens and stress signals. The 
signal transduction pathways involved in pro- 
grammed cell death, or apoptosis, are medi- 
ated by interactions between well-character- 
ized domains that include extracellular do- 
mains, adaptor (protein-protein interaction) 
domains, and those found in effector and 
regulatory enzymes (137). We enumerated 
the protein counts of central adaptor and ef- 
fector enzyme domains that are found only in 
the apoptotic pathways to provide an estimate 
of divergence across eukarya and relative 
expansion in the human genome when com- 
pared with the fly and worm (Table 18). 
Adaptor domains found in proteins restricted 
only to apoptotic regulation such as the DED 
domains are vertebrate-specific, whereas oth- 
ers like BIR, CARD, and BcI2 are represent- 
ed in the fly and worm (although the number 
of Bcl2 family members in humans is signif- 
icantly expanded). Although plants and yeast 
lack the caspases, caspase-like molecules, 
namely the para- and meta-caspases, have 
been reported in these organisms (138). Com- 
pared with other animal genomes, the human 
genome shows an expansion in the adaptor 
and effector domain-containing proteins in- 
volved in apoptosis, as well as in the pro- 
teases involved in the cascade such as the 
caspase and calpain families. 

Expansions of other protein families. 
Metabolic enzymes. There are fewer cyto- 
chrome P450 genes in humans than in either 
the fly or worm. Lipoxygenases (six in hu- 
mans), on the other hand, appear to be specific 
to the vertebrates and plants, whereas the lip- 
oxygenase-activating proteins (four in humans) 
may be vertebrate-specific. Lipoxygenases are 
involved in arachidonic acid metabolism, and 
they and their activators have been implicated 
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in diverse human pathology ranging from 
allergic responses to cancers. One of the most 
surprising human expansions, however, is in 
the number -of glyceraIdehyde-3-phosphate 
dehydrogenase (GAPDH) genes (46 in hu- 
mans, 3. in the fly, and 4 in the worm). There 
is, however, evidence for many retrotrans- 



posed GAPDH pseudogenes (139), which 
may account for this apparent expansion. 
However, it is interesting that GAPDH, long 
known as a conserved enzyme involved in 
basic metabolism found across all phyla from 
bacteria to humans, has recently been shown 
to have other functions. It has a second cat- 
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alytic activity, as a uracil DNA glycosylase 

(140) and functions as a cell cycle regulator 

(141) and has even been implicated in apo- , 
ptosis (142). 

Translation. Another striking set of hu-. 
man expansions has occurred in certain fam- 
ilies involved in the translational machinery.^ 
We identified 28 different ribosomal subunits 
that each have at least 10 copies in the ge- 
nome; on average, for all ribosomal proteins 
there is about an 8- to 10-fold expansion in, 
the number of genes relative to either the. 
worm or fly. Retrotransposed pseudogenes 

Table 19 (Continued) 
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may account for many of these expansions 
[see the discussion above and (143)]. Recent 
evidence suggests that a number of ribosomal 
/proteins have secondary functions indepen- 
\ dent of their involvement in protein biosyn- 
thesis; for example, LI 3a and the related L7 
subunits (36 copies in humans) have been . 
; ; shown to induce apoptosis (144). . .. 
~ ;;There.is also afpur : to;fivefold expansion . 

in ^ the ? elongation/; factor 1 -alpha ..family,. 
.? (eEFIA; 56 human genes). Many, of these ' 

expansions likely represent intronless para- 
•Uogs that have presumably arisen from retro- . 
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.transposition, and again there is evidence that 
many of these may be pseudogenes (145). 
However, a second form (eEFlA2) of this 
factor has been identied with tissue-specific 
. expression in skeletal muscle and a comple- 
mentary expression pattern to the ubiquitous- 
: ly expressed eEFIA (146). 

. >RibonucIeoproteins.^Mt^ 
V, results in multiple transcripts , from a single 
. gene, and can .therefore, generate additional 
: diversity in an organism's protein comple- 
ment. We have identified 269 genes for ri- 
bonucleoproteins. This represents over 2.5 
times the number of ribonucleoprotein genes 
in the . worm, two times that of the fly, and 
about the same as the . 265 identified in the 
s Arabidopsis genome. Whether the diversity 
of ribonucleoprotein genes in humans con- 
tributes to gene regulation at either the splic- 
ing or translational level is unknown. 

Posttranslational modifications. In this 
set of processes, the most prominent expan- 
sion is the transglutaminases, calcium-depen- 
dent enzymes that catalyze the cross-linking 
of proteins in cellular processes such as he- 
mostasis and apoptosis (147). The vitamin 
K- dependent gamma carboxylase gene prod- 
uct acts on the GLA domain (missing in the 
fly and worm) found in coagulation factors, 
osteocalcin, and matrix GLA. protein (148)] 
Tyrosylprotein sulfotransferases participate 
in the posttranslational modification of pro- 
teins involved in inflammation and hemosta- 
sia including coagulation factors and chemo- 
kine receptors (149). Although there is no 
significant numerical increase in the counts 
for domains involved in nuclear protein mod- 
ification, there are a number of domain ar- 
rangements in the predicted human proteins 
that are not found in the other currently se- 
quenced genomes. These include the tandem 
association of two histone deacetylase do- 
mains in HD6 with a ubiquitin finger domain, 
a feature lacking in the fly genome. An ad- 
ditional example is the co-occurrence of im- 
portant nuclear regulatory enzyme PARP 
(poly-ADP ribosyl transferase) domain fused 
to protein-interaction domains-^-BRCT and 
VWA in humans. 

Concluding remarks. There are several 
possible explanations for the differences in 
phenotypic complexity observed in humans 
when compared to the fly and worm. Some of 
these relate to the prominent differences in 
the immune system, hemostasis, neuronal, 
vascular, and cytoskeletal "complexity. The 
finding that the human genome contains few- 
er genes than previously predicted might be 
compensated for by combinatorial diversity 
generated at the levels of protein architecture, 
transcriptional and translational control, post- 
translational modification of proteins, or 
posttranscriptional regulation. Extensive do- 
main shuffling to increase or alter combina- 
torial diversity can provide an exponential 
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increase in the ability to mediate protein- 
protein interactions without dramatically in- 
creasing the absolute size of the protein com- 
plement (J 50). Evolution of apparently new 
(from the perspective of sequence analysis) 
protein domains and increasing . regulatory 
complexity by domain accretion both quanti- 
tatively and qualitatively (recruitment of nov- 
el domains with preexisting ones) are two 
features that we.observe in humans. Perhaps 
the best; illustration of this trend is the C2H2 
zinc fmger^bntaining transcription factors, 
where we see expansion in the number of 
domains per protein, together with verte- 
brate-specific domains such as KRAB and 
SCAN. Recent reports on the prominent use 
of internal ribosomal entry sites in the human 
genome to regulate translation of specific 
classes of proteins suggests that this is an area 
that needs further research to identify the full 
extent of this process in the human genome 
{151). At the posttranslational level, although 
we provide examples of expansions of some 
protein families involved in these modifica- 
tions, further experimental evidence is re- 
quired to evaluate whether this is correlated 
with increased complexity in protein process- 
ing. Posttranscriptional processing and the 
extent of isoform generation in the human 
remain to be cataloged in their entirety. Given 
the conserved nature of the spliceosomal ma- 
chinery, further analysis will be required to 
dissect regulation at this level. 




zinc finger- containingf 
CREB 

ETS-related 

Forkhead-related 

FQS 

Groucho 
Histone H1 
Histone H2A 
Histone H2B 
Histone H3 
Histone H4 
Homeoticf 

ABD-B 

Bithoraxoid 

Iroquois class 
. Distal-less 

Engrailed 

UM-containing 

MEIS/KNOX class 

NK-3/NK-2 class 
Paired box 
Six 

Leucine zipper 

Nuclear hormone receptorf 

Pou-related 

Runt-related 



8 Conclusions 

8.1 The whole-genome sequencing 
approach versus BAC by BAC 

Experience in applying the whole-genome 
shotgun sequencing approach to a diverse 
group of organisms with a wide range of 
genome sizes and repeat content allows us to 
assess its strengths and weaknesses. With the 
success of the method for a large number of 
microbial genomes, Drosophila, and now the 
human, there can be no doubt concerning the 
utility of this method. The large number of 
microbial genomes that have been sequenced 
by this method (75, 80, 152) demonstrate that 
megabase-sized genomes can be sequenced 
efficiently without any input other that the de 
novo mate-paired sequences. With more 
complex genomes like those ofDrosophila or 
human, map information, in the form of well- 
ordered markers, has been critical for long- 
range ordering of scaffolds. For joining scaf- 
folds into chromosomes, the quality of the 
map (in terms of the order of the markers) is 
more important than the number of markers 
per se. Although this mapping could have 
been performed concurrently with sequenc- 
ing, the prior existence of mapping data was 
beneficial. During the sequencing of the A. 
thaliana genome, sequencing of individual 
BAC clones permitted extension of the se- 
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quence well into centromeric regions and al- 
lowed high-quality resolution of complex re- 
peat regions. Likewise, in Drosophila, the 
BAC physical map was most useful in re- 
gions near the highly repetitive centromeres 
and telomeres. WGA.has been found to de- 
liver excellent-quality reconstructions of the 
, unique regions of the genome. As the genome, 
size, and more importantly the repetitive con- 
tent, increases, the WGA approach delivers 
less of the repetitive sequence. 

The cost and overall efficiency of clone-by- 
clone approaches makes them difficult to justify 
as a stand-alone strategy for future large-scale 
genome-sequencing projects. Specific applica- 
tions of BAC-based or other clone mapping and 
sequencing strategies to resolve ambiguities in . 
sequence assembly that cannot be efficiently 
resolved with computational approaches alone 
are clearly worth exploring. Hybrid approaches 
to whole-genome sequencing will only work if 
there is sufficient coverage in both the whole- 
genome shotgun phase and the BAC clone se- 
quencing phase.. Our experience with human 
genome assembly suggests that this will require 
at least 3 X coverage of both whole-genome and 
BAC shotgun sequence data. 



8.2 The low gene number in humans 

We have sequenced and assembled —95% of 
the euchromatic sequence of 77. sapiens and 
used a new automated gene prediction meth- 
. od to produce a preliminary catalog of the 
human genes. This has provided a major sur- 
prise: We have found far fewer genes (26,000 
to 38,000) than the earlier molecular pre- 
dictions (50,000 to over 140,000). Whatever . 
the reasons for this current disparity, only , 
detailed annotation, comparative genomics 
(particularly using the Mus musculus ge- . 
nome), and careful molecular dissection of 
complex phenotypes will clarify this critical 
issue of the basic "parts list" of our genome. 
Certainly, the analysis is still incomplete and 
considerable refinement will occur in the 
years to come as the precise structure of each 
. transcription unit is evaluated. A good place 
to start is to determine why the gene esti- 
mates derived from EST data are so discor- 
dant with our predictions. It is likely that the 
. following contribute to an inflated gene num- 
ber derived from ESTs: the variable lengths 
of 3'- and 5 '-untranslated leaders and trailers; 
; the little-understood vagaries of RNA pro- 
cessing that often leave intronic regions in an 
unspliced condition; the finding that nearly 
40% of human genes are alternatively spliced 
(755); and finally, the unsolved technical 
problems in EST library construction where 
contamination from heterogeneous nuclear 
RNA and genomic DNA are not uncommon. 
Of course, it is possible that there are genes 
that remain unpredicted owing to the absence 
of EST or protein data to support them, al- 
though our use of mouse genome data for 



predicting genes should limit this number. As 
was true at the beginning of genome sequenc- 
ing, ultimately it will be necessary to measure 
• mRNA in specific cell types to demonstrate 
the presence of a gene. 

■ J. B. S. Haldane speculated in 1 937 that a 
population of organisms might, have to pay a 
v price forthe. number, of genes it can possibly 
- carry. He theorized, that when the number of 
, genes becomes too large, each zygote carries < 
1 so many new deleterious mutations that the 
population simply cannot; maintain itself. On 
the basis of this premise, and on the basis of 
available mutation rates and x-ray-induced 
mutations at specific loci, Muller, in 1967 
{154% calculated mat the mammalian ge- 
nome would contain a maximum of not much 
more than 30,000 genes {155). An estimate of 
30,000 gene loci for humans was also arrived 
. at by Crow and Kimura (756*). Muller's esti- 
mate for A melanogaster was 10,000 genes, 
'-. compared to 13,000 derived by annotation of 
the fly genome (26*, 27). These arguments for 
the theoretical maximum gene number were 
based on simplified ideas of genetic load — 
that all genes have a certain low rate of 
mutation to a deleterious state. However, it is 
clear that many mouse, fly, worm, and yeast 
knockout mutations lead to almost no dis- 
cernible phenotypic perturbations. 

; The ■: modest number of human genes , 
means that we must look elsewhere for the. 
mechanisms .that generate .the complexities ■ . 
..inherent in human development and the so- , 
phisticated signaling systems that maintain 
homeostasis. There are a large number of 
ways in which the functions of individual 
genes and gene products are regulated. The 
degree of "openness" of chromatin structure 
and hence transcriptional activity is regulated 
by . protein complexes that, involve histone 
and DNA enzymatic modifications. We enu- 
merate many of the proteins that are likely 
involved in nuclear regulation in Table 19. 
The location, timing, and quantity of tran- 
scription are intimately linked to nuclear sig- 
nal transduction events as well as by the 
tissue-specific expression of many of these 
proteins. Equally important are regulatory 
DNA elements that include insulators, re- 
peats, and endogenous viruses {157); meth- 
ylation of CpG islands in imprinting {158); 
and promoter-enhancer and intronic regions 
that modulate transcription. The spliceosomal 
machinery consists of multisubunit proteins 
(Table 19) as well as structural and catalytic 
RNA elements {159) that regulate transcript 
structure through alternative start and termi- 
nation sites and splicing. Hence, there is a 
need to study different classes of RNA mol- 
ecules {160) such as small nucleolar RNAs, 
antisense riboregulator RNA, RNA involved 
in X-dosage compensation, and other struc- 
tural RNAs to appreciate their precise role in 
regulating gene expression. The phenomenon 



of RNA editing in which coding changes 
occur directly at the level of mRNA is of 
clinical and biological relevance {161). Final- 
ly, examples of translation^ control include 
internal ribosomal entry sites that are found 
in proteins involved in cell cycle regulation 
and apoptosis {162). At the . protein level, 
-minor ^alterations in the .nature , of -protein- 
protein interactions, protein -modifications, 
and localization can have dramatic effects on 
cellular physiology {163). This dynamic sys- 
tem therefore has many ways to modulate 
activity, which suggests that definition of 
complex systems by analysis of single genes 
is unlikely to be entirely successful. 

In situ studies have shown that the human 
genome is asymmetrically populated with 
,- G+C content, CpG islands, and genes {68). 
. However, the genes are not distributed quite 
r. as unequally as had been predicted (Table 9) 
{69). The most G+C-rich fraction of the ge- 
nome, H3 isochores, constitute more of the 
genome than previously thought (about 9%), 
and are the most gene-dense . fraction, but 
contain only 25% of the genes, rather than the 
predicted -40%. The low G+C L isochores 
make up 65% of the genome, and 48% of the 
genes. This inhomogeneity, the net result of 
millions of years of mammalian gene dupli- 
cation, has been described as the "desertifi- 
cation" of the vertebrate genome (77). Why 
are there clustered regions of high and low 
gene density, and are these accidents of his- 
tory or driven by selection and evolution? If 
these deserts are dispensable, it ought to be 
possible to find mammalian genomes that are 
far smaller in size than the human genome. 
Indeed, many , species of bats have genome 
sizes that are much smaller than that of hu- 
, mans; for example, Miniopterus, a species of 
Italian bat, has a genome size that is only* 
50% that of humans {164). Similarly, Mun- 
tiacus f a species of Asian barking deer, has a 
genome size that is -70% that of humans. 



8.3 Human DNA sequence variation 
and its distribution across the genome 

This is the first eukaryotic genome in which a 
nearly uniform ascertainment of polymorphism 
has been completed Although we have identi- 
fied and mapped more than 3 million SNPs, this 
by no means implies that the task of finding and 
cataloging SNPs is complete. These represent 
only a fraction of the SNPs present in the 
human population as a whole. Nevertheless, 
this first glimpse at genome-wide variation has 
revealed strong inhomogeneities in the distribu- 
tion of SNPs across the genome. Polymorphism 
in DNA carries with it a snapshot of the past 
operation of population genetic forces, includ- 
ing mutation, migration, selection, and genetic 
drift. The availability of a dense array of SNPs 
will allow questions related to each of these 
factors to be addressed on a genome-wide basis. 
SNP studies can establish the range of haplo- 
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types present in subjects of different ethnogeo- 
graphic origins, providing insights into popula- 
tion history and migration patterns. Although 
such studies have suggested that modern human 
lineages derive from Africa, many important 
questions regarding human origins remain un- 
: answered, and more analyses using detailed 
SNP maps will be needed to settle these con- 
troversies. In addition to providing evidence for 
population expansions, migration, and admix-, 
ture, SNPs can serve as markers for the extent 
of evolutionary constraint acting on particular 
genes. The correlation between patterns of in- 
traspecies and interspecies genetic variation 
may prove to be especially informative to iden- 
tify sites of reduced genetic diversity that may 
mark loci where sequence variations are not 
tolerated. 

The remarkable heterogeneity in SNP 
density implies that there are a variety of 
forces acting on polymorphism— sparse re- 
gions may have lower SNP density because 
the mutation rate is lower, because most of 
those regions have a lower fraction of muta- 
tions that are tolerated, or because recent 
strong selection in favor of a newly arisen 
allele "swept" the linked variation out of the 
population {165). The effect of random ge- 
netic drift also varies widely across the ge- 
nome. The nonrecombining portion of the Y 
chromosome faces the strongest pressure 
from random drift because there are roughly 
one-quarter as many Y chromosomes in the 
population as there are autosomal chromo- 
somes, and the level of polymorphism on the 
Y is correspondingly less. Similarly, the X 
chromosome has a smaller effective popu- 
lation size than the autosomes, and its nu- 
cleotide diversity is also reduced. But even 
across a single autosome, the effective pop- 
ulation size can vary because the density of 
deleterious mutations may vary. Regions of 
high density of deleterious mutations will 
see a greater rate of elimination by selec- 
tion, and the effective population size will 
be smaller (166). As a result, the density of 
even completely neutral SNPs will be lower 
in such regions. There is a large literature 
on the association between SNP density 
and local recombination rates in Drosoph- 
ila, and it remains an important task to 
assess the strength of this association in the 
human genome, because of its impact on 
the design of local SNP densities for dis- 
ease-association studies. It also remains an 
important task to validate SNPs on a 
genomic scale in order to assess the degree 
of heterogeneity among geographic and 
ethnic populations. 



The Human Genome 



8.4 Genome complexity 

We will soon be in a position to move away 
from the cataloging of individual compo- 
nents of the system, and beyond the sim- 
plistic notions of "this binds to that, which 



then docks on this, and then the complex 
moves there. . . (167) to the exciting area 
of network perturbations, . nonlinear re- 
sponses and thresholds, and their pivotal 
role in human diseases. 
: The enumeration of other "parts lists" re- 
veals that in organisms with complex nervous 
systems, neither gene number, neuron number, 
nor number of cell types correlates in any , 
meaningful manner with even simplistic mea-, 
sures of structural. or .behavioral xomplexity. 
Nor would they be expected to; this is the realm 
of nonlinearities and epigenesis (168). The 520 
, million neurons of the common octopus exceed . 
the neuronal number in the brain of a mouse by 
an order of magnitude. It is apparent from a 
comparison of genomic data on the mouse and 
human, and from comparative mammalian neu- . 
roanatomy (169), that the morphological and . 
behavioral diversity found in mammals is un- 
derpinned by a similar gene repertoire and sim- 
ilar neuroanatomies. For example, when one 
compares a pygmy marmoset (which is only 4 
inches tall and weighs. about 6 ounces) to' a. 
chimpanzee, the brain volume of this minute 
primate is found to be only about 1.5 cm 3 , two 
orders of magnitude less than that of a chimp 
and three orders less than that of humans. Yet 
the neuroanatomies of all three brains are strik- 
ingly similar, and the behavioral characteristics 
of the pygmy marmoset are little different from 
those of chimpanzees. Between humans and 
chimpanzees, the gene number, gene structures 
. and functions, chromosomal and genomic or- ■. 
. ganizations, and cell types and neuroanatomies 
are almost indistinguishable, yet the develop- 
mental modifications that predisposed human 
lineages to cortical expansion and development 
of the larynx, giving rise to language, culminat- 
ed in a massive singularity that by even the 
simplest of criteria made humans more com- 
plex in a behavioral sense. 

Simple examination of the number of neu- . 
rons, cell types, or genes or of the genome 
size does not alone account for the differenc- 
es in complexity that we observe. Rather, it is 
the interactions within and among these sets 
that result in such great variation. In addition, 
it is possible that there are "special cases" of 
regulatory gene networks that have a dispro- 
portionate effect on the overall system. We 
have presented several examples of "regula- 
tory genes" that are significantly increased in 
the human genome compared with the fly and 
worm. These include extracellular ligands 
and their cognate receptors (e.g., wnt, friz- 
zled, TGF-0, ephrins, and connexins), as well 
as nuclear regulators (e.g., the KRAB and 
homeodomain transcription factor families), 
where a few proteins control broad develop- 
mental processes. The answers to these 
"complexities" perhaps lie in these expanded 
gene families and differences in the regulato- 
ry control of ancient genes, proteins, path- 
ways, and cells. 
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8.5 Beyond single components 

While few would disagree with the intuitive 
conclusion that Einstein's brain was more 
complex than that of Drosophila, closer com- 
parisons such as whether the set of predicted 
human proteins is more complex than the 
protein set of Drosophila, and if so, to what 
degree, are not straightforward, since protein, 
; . protein domain, or protein-protein interaction 
measures do not capture, context-dependent 
interactions that underpin, the dynamics \un- : 
deriving phenotype. 

Currently, there are more than 30 different 
mathematical descriptions of complexity (170). 
However, we have yet to understand the math- 
ematical dependency relating the number of 
genes with organism complexity. One pragmat- 
ic approach to the analysis of biological sys- 
tems, which are composed of nonidentical ele- 
ments (proteins, protein complexes, interacting 
cell types, and interacting neuronal popula- 
' tions), is through graph theory (171). The ele- 
ments of the system can be represented by the 
vertices of complex topographies, with the edg- 
es representing the interactions between them. 
Examination of large networks reveals that they 
can self-organize, but more important, they can " 
be particularly robust This robustness is not 
due to redundancy, but is a property of inho- 
mogeneously wired networks. The error toler- 
ance of such networks comes with a price; they 
are vulnerable to the selection or removal of a 
few nodes that contribute disproportionately to 
network stability. Gene knockouts provide an . 
illustration. Some knockouts may have minor 
effects, whereas others have catastrophic effects 
on the system. In the case of vimentin, a sup- 
posedly critical component of the cytoplasmic 
intermediate filament network of mammals, the 
knockout of the gene in mice reveals them to be 
reproductively normal, with no obvious pheno- 
typic effects (172% and yet the usually conspic- 
uous vimentin network is completely absent 
On the other hand, -30% of knockouts in 
Drosophila and mice correspond to critical 
nodes whose reduction in gene product, or total 
elimination, causes the network to crash most 
of the time, although even in some of these 
cases, phenotypic normalcy ensues, given the 
appropriate genetic background Thus, there are . 
no "good" genes or "bad" genes, but only net- 
works that exist at various levels and at differ- 
ent connectivities, and at different states of 
sensitivity to perturbation. Sophisticated math- 
ematical analysis needs to be constantly evalu- 
ated against hard biological data sets that spe- 
cifically address network dynamics. Nowhere is 
this more critical than in attempts to come to 
grips with "complexity " particularly because 
deconvoluting and correcting complex net- 
works that have undergone perturbation, and 
have resulted in human diseases, is the greatest 
significant challenge now racing us. 

It has been predicted for the last 15 years 
that complete sequencing of the human ge- 
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nome would open up new strategies for hu- 
man biological research and would have a 
major impact on medicine, and through med- 
icine and public health, on society. Effects on 
biomedical research are already being felt. 
This assembly of the human genome se- 
quence is but a first, hesitant step on a long 
and exciting journey toward understanding 
the role of the genome in human biology. It 
has been possible only because of innova- 
tions in instrumentation and software that 
have allowed automation of almost every step 
of the process from DNA preparation to an- 
notation. The next steps are clear: We must 
define the complexity that ensues when this 
relatively modest set of about 30,000 genes is 
■ expressed. The sequence provides the frame- 
work upon which all the genetics, biochem- 
istry, physiology, and ultimately phenotype 
depend. It provides the boundaries for scien- 
tific inquiry. The sequence is only the first 
level of understanding of the genome. All 
genes and their control elements must be 
identified; their functions, in concert as well 
as in isolation, defined; their sequence varia- 
tion worldwide described; and the relation 
between genome variation and specific phe- 
notypic characteristics determined. Now we 
know what we have to explain. 

Another paramount challenge awaits: 
public discussion of this information and its 
potential for improvement of personal health. 
Many diverse sources of data have shown 
that any two individuals are more than 99.9% 
identical in sequence, which means that all 
the glorious differences among individuals in 
our species that can be attributed to genes 
falls in a mere 0.1% of the sequence. There 
are two fallacies to be avoided: determinism, 
the idea that all characteristics of the person 
are "hard-wired" by the genome; and reduc- 
tionism, the view that with complete knowl- 
edge of the human genome sequence, it is 
only a matter of time before our understand- 
ing of gene functions and interactions will 
provide a complete causal description of hu- 
man variability. The real challenge of human 
biology, beyond the task of finding out how 
genes orchestrate the construction and main- 
tenance of the miraculous mechanism of our 
bodies, will lie ahead as we seek to explain 
how our minds have come to organize 
thoughts sufficiently well to investigate our 
own existence. 
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share at least one significant BLAST hit in common. 
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>XM_093852 ACCESSION:XM_093852 NID: gi 18556797 ref XM_093852.1 
Homo sapiens similar to epidermis specific serine 
protease (LOC166414) , mRNA 
Length =1095 

Score = 507 bits (1291), Expect = e-141 

Identities = 242/244 (99%), Positives = 242/244 (99%) , Gaps = 1/244 (0%) 
Frame ; = +1 

Query- 1 MGPAGCAFTLLLLLGISVCGQPVySSRWGGQDAAAGRWPWQVSLHFDHNFIYGGSLVSE 60 

MGPAGCAFTLLLLLGISVCGQPVYSSRWGGQDAAAGRWPWQVSLHFDHNFI GGSLVSE 
Sbjct: 1 MGPAGCAFTLLLLLGISVCGQPVYSSRWGGQDAAAGRWPWQVSLHFDHNFICGGSLVSE 180 

Query 61 RLILTAAHCIQPTWTTFSYTVWLGSITVGDSRKRVKYYVSKIVIHPKYQDTTAD-ALLKL 119 

RLILTAAHCIQPTWTTFSYTVWLGSITVGDSRKRVKYYVSKIVIHPKYQDTTAD ALLKL 
Sbjct: 181 RLILTAAHCIQPTWTTFSYTVWLGSITVGDSRKRVKYYVSKIVIHPKYQDTTADVALLKL 360 

Query- 120 SSQVTFTSAILPICLPSVTKQIAIPPFCWVTGWGKVKESSDRDYHSALQEAEVPIIDRQA 179 

SSQVTFTSAILPICLPSVTKQLAI PPFCWVTGWGKVKES SDRDYHSALQEAEVPI IDRQA 
Sbjct: 361 SSQVTFTSAILPICLPSVTKQLAI PPFCWVTGWGKVKESSDRDYHSALQEAEVPIIDRQA 540 

Query- 180 CEQLYNPIGIFLPALEPVIKEDKICAGDTQNMKDSCKGDSGGPLSCHIDGVWIQTGWSW 239 

CEQLYNPIGIFLPALEPVIKEDKICAGDTQNMKDSCKGDSGGPLSCHIDGVWIQTGWSW 
Sbjct: 541 CEQLYNPIGIFLPALEPVIKEDKICAGDTQNMKDSCKGDSGGPLSCHIDGVWIQTGWSW 720 

Query: 240 GLEC 243 
GLEC 

Sbjct: 721 GLEC 732 
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>AX360076 ACCESSION: AX360076 NID: 18675702 Homo sapiens Sequence 32 
from Patent WO0200860. gbPatent 
Length = 987 

Score = 675 bits (1724), Expect =0.0 

Identities. = 325/328 (99%), Positives = 325/328 (99%), Gaps = 2/328 (0%) 
Frame = +1 

Query: 1 MGPAGCAFTLLLLLGISVCGQPVYSSRWGGQDAAAGRWPWQVSLHFDHNFIYGGSLVSE 60 

MGPAGC AFTLLLLLGI SVCGQPVYS SRWGGQDAAAGRWPWQVSLHFDHNF I GGSLVSE 
Sbjct: 1 MGPAGCAFTLLLLLGI SVCGQPVYS SRWGGQDAAAGRWPWQVSLHFDHNF I CGGSLVSE 180 

Query: 61 RLILTAAHCIQPTWTTFSYTVWLGSITVGDSRKRVKYYVSKIVIHPKYQDTTAD-ALLKL 119 

RLILTAAHCIQPTWTTFSYTVWLGSITVGDSRKRVKYYVSKIVIHPKYQDTTAD ALLKL 
Sbjct: 181 RLILTAAHCIQPTWTTFSYTVWLGSITVGDSRKRVKYYVSKIVIHPKYQDTTADVALLKL 360 

Query: 120 SSQVTFTSAILPICLPSVTKQLAIPPFCWVTGWGKVKESSDRDYHSALQEAEVPIIDRQA 179 

SSQVTFTSAILPICLPSVTKQLAIPPFCWVTGWGKVKESSDRDYHSALQEAEVPIIDRQA 
Sbjct: 361 SSQVTFTSAILPICLPSVTKQLAIPPFCWVTGWGKVKESSDRDYHSALQEAEVPIIDRQA 540 

Query: 180 CEQLYNPIGIFLPALEPVIKEDKICAGDTQNMKDSCKGDSGGPLSCHIDGVWIQTGWSW 239 

CEQLYNPIGIFLPALEPVIKEDKICAGDTQNMKDSCKGDSGGPLSCHIDGVWIQTGWSW 
Sbjct: 541 CEQLYNPIGIFLPALEPVIKEDKICAGDTQNMKDSCKGDSGGPLSCHIDGVWIQTGWSW 720 

Query: 240 GLECGKSLPGVYTNVIYYQKWINATISRANNLDFSDFLFPIVLLSLALL-PSCAFGPNTI 298 

GLECGKSLPGVYTNVIYYQKWINATISRANNLDFSDFLFPIVLLSLALL PSCAFGPNTI 
Sbjct: 721 GLECGKSLPGVYTNVIYYQKWINATISRANNLDFSDFLFPIVLLSLALLRPSCAFGPNTI 900 

Query: 299 HRVGTVAEAVAC IQGWEENAWRFS PRGR 326 

HRVGTVAEAVAC IQGWEENAWRF S PRGR 
Sbjct: 901 HRVGTVAEAVAC I QGWE ENAWRF S PRGR 984 
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Abstract 

The present invention relates to protease 
polypeptides, nucleotide sequences encoding 
the protease polypeptides, as well as various 
products and methods useful for the diagnosis 
and treatment of various protease-related 
diseases and conditions. 
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ARRAY OF O LIGONUCLEOTTDES ON A SOLID 
SUBSTRATE 

CROSS-REFERENCE TO RELATED 5 
APPLICATIONS 

This application is a Rule 60 Division of U.S. applica- 
tion Ser. No. 850,356, filed Mar. 12, 1992, which is a 
Rule 60 Division of U.S. application Ser. No. 492,462, 
filed Mar. 7, 1990, now U.S. Pat No. 5,143,854, which 10 
is a Continuation-in-Part of U.S. application Ser. No. 
362,901, filed Jun. 7, 1989, now abandoned, all assigned 
to the assignee of the present invention. 

The file of this patent contains drawings executed in 
color. Copies of this patent with color drawings will be 15 
provided by the Patent and Trademark Office upon 
request and payment of the necessary fee. 

COPYRIGHT NOTICE 

A portion of the disclosure of this patent document 20 
contains material which is subject to copyright protec- 
tion. The copyright owner has no objection to the fac- 
simile reproduction by anyone of the patent document 
or the patent disclosure as it appears in the Patent and 
Trademark Office patent file or records, but otherwise 25 
reserves all copyright rights whatsoever. 

BACKGROUND OF THE INVENTION 

The present inventions relate to the synthesis and 
placement of materials at known locations. In particu- 3D 
lar, one embodiment of the inventions provides a 
method and associated apparatus for preparing diverse 
chemical sequences at known locations on a single sub- 
strate surface. The inventions may be applied, for exam- 
ple, in the field of preparation of oligomer, peptide, 35 
nucleic acid, oligosaccharide, phospholipid, polymer, 
or drug congener preparation, especially to create 
sources of chemical diversity for use in screening for 
biological activity. 

The relationship between structure and activity of 40 
molecules is a fundamental issue in the study of biologi- 
cal systems. Structure-activity relationships are impor- 
tant in understanding, for example, the function of en- 
zymes, the ways in which cells communicate with each 
other, as well as cellular control and feedback systems. 45 

Certain macromolecules are known to interact and 
bind to other molecules having a very specific three-di- 
mensional spatial and electronic distribution. Any large 
molecule having such specificity can be considered a 
receptor, whether it is an enzyme catalyzing hydrolysis 50 
of a metabolic intermediate, a cell-surface protein medi- 
ating membrane transport of ions, a glycoprotein serv- 
ing to identify a particular cell to its neighbors, an IgG- 
class antibody circulating in the plasma, an oligonucleo- 
tide sequence of DNA in the nucleus, or the like. The 55 
various molecules which receptors selectively bind are 
known as ligands. 

Many assays are available for measuring the binding 
affinity of known receptors and ligands, but the infor- 
mation which can be gained from such experiments is 60 
often limited by the number and type of ligands which 
are available. Novel ligands are sometimes discovered 
by chance or by application of new techniques for the 
elucidation of molecular structure, including x-ray crys- 
tallographic analysis and recombinant genetic tech- 65 
niques for proteins. 

Small peptides are an exemplary system for exploring 
the relationship between structure and function in biol- 
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ogy. A peptide is a sequence of amino acids. When the 
twenty naturally occurring amino acids are condensed 
into polymeric molecules they form a wide variety of 
three-dimensional configurations, each resulting from a 
particular amino acid sequence and solvent condition. 
The number of possible pentapeptides of the 20 natu- 
rally occurring amino acids, for example, is 20 5 or 3.2 
million different peptides. The likelihood that molecules 
of this size might be useful in receptor-binding studies is 
supported by epitope analysis studies showing that 
some antibodies recognize sequences as short as a few 
amino acids with high specificity. Furthermore, the 
average molecular weight of amino acids puts small 
peptides in the size range of many currently useful phar- 
maceutical products. 

Pharmaceutical drug discovery is one type of re- 
search which relies on such a study of structure-activity 
relationships. In most cases, contemporary pharmaceu- 
tical research can be described as the process of discov- 
ering novel ligands with desirable patterns of specificity 
for biologically important receptors. Another example 
is research to discover new compounds for use in agri- 
culture, such as pesticides and herbicides. 

Sometimes, the solution to a rational process of de- 
signing ligands is difficult or unyielding. Prior methods 
of preparing large numbers of different polymers have 
been painstakingly slow when used at a scale sufficient 
to permit effective rational or random screening. For 
example, the "Merrifield" method (J. Am. Chenu Soc. 
(1963) 85:2149-2154, which is incorporated herein by 
reference for all purposes) has been used to synthesize 
peptides on a solid support. In the Merrifield method, 
an amino acid is covalently bonded to a support made of 
an insoluble polymer. Another amino acid with an alpha 
protected group is reacted with the covalently bonded 
amino acid to form a dipeptide. After washing, the 
protective group is removed and a third amino acid 
with an alpha protective group is added to the dipep- 
tide. This process is continued until a peptide of a de- 
sired length and sequence is obtained. Using the Merri- 
field method, it is not economically practical to synthe- 
size more than a handful of peptide sequences in a day. 

To synthesize larger numbers of polymer sequences, 
it has also been proposed to use a series of reaction 
vessels for polymer synthesis. For example, a tubular 
reactor system may be used to synthesize a linear poly- 
mer on a solid phase support by automated sequential 
addition of reagents. This method still does not enable 
the synthesis of a sufficiently large number of polymer 
sequences for effective economical screening. 

Methods of preparing a plurality of polymer sequen- 
ces are also known in which a porous container encloses 
a known quantity of reactive particles, the particles 
being larger in than pores of the container. The 
containers may be selectively reacted with desired ma- 
terials to synthesize desired sequences of product mole- 
cules. As with other methods known in the art, this 
method cannot practically be used to synthesize a suffi- 
cient variety of polypeptides for effective screening. 

Other techniques have also been described. These 
methods include the synthesis of peptides on 96 plastic 
pins which fit the format of standard microliter plates. 
Unfortunately, while these techniques have been some- 
what useful, substantial problems remain. For example, 
these methods continue to be limited in the diversity of 
sequences which can be economically synthesized and 
screened. 
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From the above, it is seen that an improved method mask is placed on or focused on the substrate and illumi- 

and apparatus for synthesizing a variety of chemical nated so as to deprotect selected regions of the substrate 

sequences at known locations is desired. in the reactor space. A monomer is pumped through the 

cTTun/AovnE -rxrc txr /TrxrrT^xT reactor space or otherwise contacted with the substrate 

SUMMARY OF THE INVENTION 5 ^ ^ ^ ^ deprotected regions . By selectively 

An improved method and apparatus for the prepara- deprotecting regions on the substrate and flowing pre- 

tion of a variety of polymers is disclosed. determined monomers through the reactor space, de- 

In one preferred embodiment, linker molecules are sired polymers at known locations may be synthesized, 

provided on a substrate. A terminal end of the linker Improved detection apparatus and methods are also 
molecules is provided with a reactive functional group 10 disclosed. The detection method and apparatus utilize a 

protected with a photoremovable protective group. substrate having a large variety of polymer sequences at 

Using lithographic methods, the photoremovable pro- known locations on a surface thereof. The substrate is 

tective group is exposed to light and removed from the exposed to a fluorescently labeled receptor which binds 

linker molecules in first selected regions. The substrate t0 one or more of me po iymer sequences. The substrate 

is then washed or otherwise contacted with a first mon- 15 is placed m a microscope detection apparatus for identi- 

omer that reacts with exposed functional groups on the rication of locations where binding takes place. The 

linker molecules. In a preferred embodiment, the mono- microscope detection apparatus includes a monochro- 

mer is an amino acid containing a photoremovable pro- matic or polyc hromatic light source for directing light 

tective group at its amino or carboxy terminus and the at ^ substrate> m eans for detecting fluoresced light 

linker molecule terminates in an ammo or carboxy acid 20 from the substratet md means for determining a loca- 

group bearing a photoremovable protective group. ^ of tfae fluoresced U ht ^ means for detecting 

A second set of selected regions is, thereafter, ex- ^ fluoresced on ^ s * bstrate m some embodi . 

posed to light and the photoremovable protective group ^ mdude hQton ^ means for deter , 

on the hnker molecule/protected^o acid is re- ^ alocation F ofthe fl uoreS ced light may include an 

moved at the second set of regions. The substrate is then 25 , * . • A . . r . . , . .* 

. , , j . . * x/y translation table for the substrate. Translation of the 

contacted with a second monomer containing a . . „ „. jjj j 

photoremovable protective group for reaction with sbde 311(1 ^ P° Uect1011 311(1 m ^ cd b * 

exposed functional groups. Tins process is repeated to ^fPPropnately programmed digital computer, 

selectively apply monomers until polymers of a desired A further understanding of the nature and advantages 

length and desired chemical sequence are obtained. 30 of the inventions herein may be realized by reference to 

Photolabile groups are then optionally removed and the the wnainmg portions of the specification and the at- 

sequence is, thereafter, optionally capped. Side chain ^ched drawings. 

protective groups, if present, are also removed. BRIEF DESCRIPTION OF THE FIGURES 

By using the lithographic techniques disclosed 
herein, it is possible to direct light to relatively small 35 p IG. 1 illustrates masking and irradiation of a sub- 
and precisely known locations on the substrate. It is, stra * e at a 6x81 ^cation. The substrate is shown in cross- 
therefore, possible to synthesize polymers of a known section; 

chemical sequence at known locations on the substrate. FIG - 2 ttustrates the substrate after application of a 

The resulting substrate will have a variety of uses monomer "A"; 

including, for example, screening large numbers of pol- 40 p IG. 3 illustrates irradiation of the substrate at a 

ymers for biological activity. To screen for biological second location; 

activity, the substrate is exposed to one or more recep- FIG. 4 illustrates the substrate after application of 

tors such as antibodies whole cells, receptors on vesi- monomer "B"; 

cles, lipids, or any one of a variety of other receptors. FIG. 5 illustrates irradiation of the "A" monomer; 

The receptors are preferably labeled with, for example, 45 FIG- 6 illustrates the substrate after a second applica- 

a fluorescent marker, radioactive marker, or a labeled ^on of "B"; 

antibody reactive with the receptor. The location of the FIG- 7 illustrates a completed substrate; 

marker on the substrate is detected with, for example, FIGS. 8A and 8B illustrate alternative embodiments 

photon detection or autoradiographic techniques. °f a reactor system for forming a plurality of polymers 

Through knowledge of the sequence of the material at 50 °a a substrate; 

the location where binding is detected, it is possible to FIG. 9 illustrates a detection apparatus for locating 

quickly determine which sequence binds with the re- fluorescent markers on the substrate; 

ceptor and, therefore, the technique can be used to FIGS. 10A-10M illustrate the method as it is applied 

screen large numbers of peptides. Other possible appli- to the production of the trimers of monomers "A" and 

cations of the inventions herein include diagnostics in 55 "B"; 

which various antibodies for particular receptors would FIGS. 11A and 11B are fluorescence traces for stan- 

be placed on a substrate and, for example, blood sera dard fluorescent beads; 

would be screened for immune deficiencies. Still further FIGS. 12A and 12B are fluorescence curves for 

applications include, for example, selective "doping" of NVOC (6-nitroveratryloxycarbonyl) slides not exposed 

organic materials in semiconductor devices, and the 60 and exposed to light respectively; 

like. FIGS. 13A to 13D are fluorescence plots of slides 

In connection with one aspect of the invention an exposed through 100 /Am, 50 um, 20 /im, and 10 urn 

improved reactor system for synthesizing polymers is masks; 14A and 14B illustrate formation of YGGFL (a 

also disclosed. The reactor system includes a substrate peptide of sequence H2N-tyrosine-glycine-glycine- 

mount which engages a substrate around a periphery 65 phenylalanine-leucine-C02H) and GGFL (a peptide of 

thereof. The substrate mount provides for a reactor sequence H2N-glycme-glycme-phenylalanine-leucine- 

space between the substrate and the mount through or CO2H), followed by exposure to labeled Herz antibody 

into which reaction fluids are pumped or flowed. A (an antibody that recognizes YGGFL but not GGFL); 
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FIGS. 15A and 15B fluorescence plots of a slide with 
a checkerboard pattern of YGGFL and GGFL exposed 
to labeled Herz antibody; FIG. 15A illustrates a 
500x500 /im mask which has been focused on the sub- 
strate according to FIG. 8A while FIG. 15B illustrates 
a 50x50 u-m mask placed in direct contact with the 
substrate in accord with FIG. 8B; 

FIG. 16 is a fluorescence plot of YGGFL and 
PGGFL synthesized in a 50 fixn checkerboard pattern; 

FIG. 17 is a fluorescence plot of YPGGFL and 
YGGFL synthesized in a 50 /im checkerboard pattern; 

FIGS. 18A and 18B illustrate the mapping of sixteen 
sequences synthesized on two different glass slides; 

FIG. 19 is a fluorescence plot of the slide illustrated 
in FIG. 18A; and 

FIG. 20 is a fluorescence plot of the slide illustrated 
in FIG. 10B. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 
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L Glossary 

The following terms are intended to have the follow- 55 
ing general meanings as they are used herein: 

1. Complementary: Refers to the topological compati- 
bility or matching together of interacting surfaces of 
a ligand molecule and its receptor. Thus, the receptor 
and its ligand can be described as complementary, 60 
and furthermore, the contact surface characteristics 
are complementary to each other. 

2. Epitope: The portion of an antigen molecule which is 
delineated by the area of interaction with the subclass 
of receptors known as antibodies. 65 

3. Ligand: A ligand is a molecule that is recognized by 
a particular receptor. Examples of ligands that can be 
investigated by this invention include, but are not 



restricted to, agonists and antagonists for cell mem- 
brane receptors, toxins and venoms, viral epitopes, 
hormones (e.g., steroids, etc.), hormone receptors, 
peptides, enzymes, enzyme substrates, cofactors, 
drugs (e.g., opiates, etc), lectins, sugars, oligonucleo- 
tides, nucleic acids,, oligosaccharides, proteins, and 
monoclonal antibodies. 

4. Monomer: A member of the set of small molecules 
which can be joined together to form a polymer. The 
set of monomers includes but is not restricted to, for 
example, the set of common L-amino acids, the set of 
D-amino acids, the set of synthetic amino acids, the 
set of nucleotides and the set of pentoses and hexoses. 
As used herein, monomers refers to any member of a 
basis set for synthesis of a polymer. For example, 
dimers of L-amino acids form a basis set of 400 mono- 
mers for synthesis of polypeptides. Different basis 
sets of monomers may be used at successive steps in 
the synthesis of a polymer. 

5. Peptide: A polymer in which the monomers are alpha 
amino acids and which are joined together through 
amide bonds and alternatively referred to as a poly- 
peptide. In the context of this specification it should 
be appreciated that the amino acids may be the L- 
optical isomer or the D-optical isomer. Peptides are 
more than two amino acid monomers long, and often 
more than 20 amino acid monomers long. Standard 
abbreviations for amino acids are used (e.g., P for 
proline). These abbreviations are included in Stryer, 
Biockemstry, Third Ed., 1988, which is incorporated 
herein by reference for all purposes. 

6. Radiation: Energy which may be selectively applied 
including energy having a wavelength of between 
10- 14 and 10* meters including, for example, electron 
beam radiation, gamma radiation, x-ray radiation, 
ultraviolet radiation, visible light, infrared radiation, 
microwave radiation, and radio waves. "Irradiation" 
refers to the application of radiation to a surface. 

7. Receptor: A molecule that has an affinity for a given 
ligand. Receptors may be naturally-occuring or man- 
made molecules. Also, they can be employed in their 
unaltered state or as aggregates with other species. 
Receptors may be attached, covalently or noncova- 
lently, to a binding member, either directly or via a 
specific binding substance. Examples of receptors 
which can be employed by this invention include, but 
are not restricted to, antibodies, cell membrane recep- 
tors, monoclonal antibodies and antisera reactive 
with specific antigenic determinants (such as on vi- 
ruses, cells or other materials), drugs, polynucleo- 
tides, nucleic acids, peptides, cofactors, lectins, sug- 
ars, polysaccharides, cells, cellular membranes, and 
organelles. Receptors are sometimes referred to in the 
art as anti-ligands. As the term receptors is used 
herein, no difference in meaning is intended. A "Li- 
gand Receptor Pair" is formed when two macromol- 
ecules have combined through molecular recognition 
to form a complex. 

Other examples of receptors which can be investi- 
gated by this invention include but are not restricted to: 

a) Microorganism receptors: Determination of li- 
gands which bind to receptors, such as specific 
transport proteins or enzymes essential to survival 
of microorganisms, is useful in a new class of antibi- 
otics. Of particular value would be antibiotics 
against opportunistic fungi, protozoa, and those 
bacteria resistant to the antibiotics in current use. 
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b) Enzymes: For instance, the binding site of enzymes o-Hydroxy-a-methyl cinnamoyl, and 2-Oxymethy- 
such as the enzymes responsible for cleaving neu- lene anthraquinone. Other examples of activators 
rotransmitters; determination of ligands which bind include ion beams, electric fields, magnetic fields, 
to certain receptors to modulate the action of the electron beams, x-ray, and the like. 

enzymes which cleave the different neurotransmit- 5 10. Predefined Region: A predefined region is a local- 

ters is useful in the development of drugs which \zed area on a surface which is, was, or is intended to 

can be used in the. treatment of disorders of neuro- . be activated for formation of a polymer. The prede- 

transmission. fined region may have any convenient shape, e.g., 

c) Antibodies: For instance, the invention may be circular, rectangular, elliptical, wedge-shaped, etc. 
useful in investigating the ligand-binding site on the 10 For the sake of brevity herein, predefined regions" 
antibody molecule which combines with the epi- are sometimes referred to simply as "regions." 
tope of an antigen of interest; determining a se- \\ Substantially Pure: A polymer is considered to be 
quence that mimics an antigenic epitope may lead "substantially pure" within a predefined region of a 
to the development of vaccines of which the immu- substrate when it exhibits characteristics that distin- 
nogen is based on one or more of such sequences or 15 it from otner predefined regions. Typically, 
lead to the development of related diagnostic purity will be measured in terms of biological activity 
agents or compounds useful in therapeutic treat- or & a resu i t Q f uniform sequence. Such 
ments such as for auto immune diseases (e.g., by characteristics will typically be measured by way of 
blocking the binding of the sdf antibodies). binding with a selected ligand or receptor. 

d) Nucleic Acids: Sequences of nucleic acids may be 20 jj Q enera j 

synthesized to establish DNA or RNA binding The present provides methods ^ ap?ara . 

sequences. tus for the preparation and use of a substrate having a 

e) Catalytic Polypeptides: Polymers, preferably poly- ^ of polymer sequences in predefined regions. 

peptides which are capable of promoting a chemi- ^ descri ^j herein ^ * gard 

cal reaction involving the conversion of one or 25 . . r i i * • • r 

* A * j*cv to the preparation of molecules containing sequences of 

more reactants to one or more products. Such . .™j . . , ... , , r 

, ^ - 11 • i j *♦ ammo acids, but could readily be applied in the prepara- 

polypepudes generally mclude a binding site spe- . r ' , _ J , r ; . t * 

•*z * * i * . « r A v.. tion of other polymers. Such polymers mclude, for 

cific for at least one reactant or reaction intermedi- . , . * J * * , * , . 

ate and an active functionality proximate to the , botn ^ ? c] ? P^ers of nucleic 

binding site, which functionality is capable of 30 Polysaccharides, phosphohpids and peptides 

chemically modifying the bound reactant. Cata- !* avm _? ? a ~> or °?- ammo 1 acids, heteropolymers 

lytic polypeptides are described in, for example, » wl " ch a ^own drug is coyalently bound to any of 

U.S. application Ser. No. 404,920, which is incor- ^ above ' PO yurethanes, polyesters, polycarbonates, 

porated herein by reference for all purposes. polyureas, polyamides, polyethyleneimines, polyary- 

f) Hormone receptors: For instance, the receptors for 35 lene sulfldes > polysiloxanes, polyunxdes, polyacetates, or 
insulin and growth hormone. Determination of the other powers which will be apparent upon review of 
ligands which bind with high affinity to a receptor ^s disclosure. In a preferred embodiment, the inven- 
is useful in the development of, for example, an oral ^ herein is used in the synthesis of peptides, 
replacement of the daily injections which diabetics The prepared substrate may, for example, be used in 
must take to relieve the symptoms of diabetes, and 40 screening a variety of polymers as ligands for binding 
in the other case, a replacement for the scarce a receptor, although it will be apparent that the 
human growth hormone which can only be ob- invention could be used for the synthesis of a receptor 
tained from cadavers or by recombinant DNA for binding with a ligand. The substrate disclosed herein 
technology. Other examples are the vasoconstric- will have a wide variety of other uses. Merely by way of 
tive hormone receptors; detennination of those 45 example, the invention herein can be used in determin- 
iigands which bind to a receptor may lead to the m S peptide and nucleic acid sequences which bind to 
development of drugs to control blood pressure. proteins, finding sequence-specific binding drugs, iden- 

g) Opiate receptors: Determination of ligands which tifying epitopes recognized by antibodies, and evalua- 
bind to the opiate receptors in the brain is useful in ti° n pf a variety of drugs for clinical and diagnostic 
the development of less-addictive replacements for 50 applications, as well as combinations of the above, 
morphine and related drugs. The invention preferably provides for the use of a 

. Substrate: A material having a rigid or semi-rigid substrate "S" with a surface. Linker molecules "L" are 

surface. In many embodiments, at least one surface of optionally provided on a surface of the substrate. The 

the substrate will be substantially flat, although in purpose of the linker molecules, in some embodiments, 

some embodiments it may be desirable to physically 55 is to facilitate receptor recognition of the synthesized 

separate synthesis regions for different polymers polymers. 

with, for example, wells, raised regions, etched Optionally, the linker molecules may be chemically 
trenches, or the like. According to other embodi- protected for storage purposes. A chemical storage 
ments, small beads may be provided on the surface protective group such as t-BOC (t-butoxycarbonyl) 
which may be released upon completion of the syn- 60 may be used in some embodiments. Such chemical pro- 
thesis, tective groups would be chemically removed upon 
. Protective Group: A material which is bound to a exposure to, for example, acidic solution and would 
monomer unit and which may be spatially removed serve to protect the surface during storage and be re- 
upon selective exposure to an activator such as elec- moved prior to polymer preparation, 
tromagnetic radiation. Examples of protective groups 65 On the substrate or a distal end of the linker mole- 
with utility herein include Nitroveratryloxy car- cules, a functional group with a protective group Po is 
bonyl, Nitrobenzyloxy carbonyl, Dimethyl dime- provided. The protective group Po may be removed 
thoxybenzyloxy carbonyl, 5-Bromo-7-nitroindolinyl, upon exposure to radiation, electric fields, electric cur- 
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rents, or other activators to expose the functional 
group. 

In a preferred embodiment, the radiation is ultraviolet 
(UV), infrared (LR), or visible light. As more fully de- 
scribed below, the protective group may alternatively 5 
be an electrochernically-sensitiye group which may be 
removed in the presence of an electric field. In still 
further alternative embodiments, ion beams, electron 
beams, or the like may be used for deprotection. 

In some embodiments, the exposed regions and, io 
therefore, the area upon which each distinct polymer 
sequence is synthesized are smaller than about 1 cm 2 or 
less than 1 mm 2 . In preferred embodiments the exposed 
area is less than about 10,000 um 2 or, more preferably, 
less than 100 jam 2 and may, in some embodiments, en- 15 
compass the binding site for as few as a single molecule. 
Within these regions, each polymer is preferably syn- 
thesized in a substantially pure form. 

Concurrently or after exposure of a known region of 
the substrate to light, the surface is contacted with a 20 
first monomer unit Mi which reacts with the functional 
group which has been exposed by the deprotection step. 
The first monomer includes a protective group Pi. Pi 
may or may not be the same as Po- 

Accordingly, after a first cycle, known first regions 25 
of the surface may comprise the sequence: 

S-L-M1-P1 

while remaining regions of the surface comprise the 30 
sequence: 

S-L-Po 

Thereafter, second regions of the surface (which may ^ 
include the first region) are exposed to light and con- 
tacted with a second monomer M2 (which may or may 
not be the same as Mi) having a protective group P2. P2 
may or may not be the same as Po and Pi. After this 
second cycle, different regions of the substrate may 
comprise one or more of the following sequences: 

S-L-Mi-M 2 -P2 S-L-M2-P2 S-L-M1-P1 and/or 
S-L-Fo- 

The above process is repeated until the substrate in- 45 
eludes desired polymers of desired lengths. By control- 
ling the locations of the substrate exposed to light and 
the reagents exposed to the substrate following expo- 
sure, the location of each sequence will be known. 

Thereafter, the protective groups are removed from 50 
some or all of the substrate and the sequences are, op- 
tionally, capped with a capping unit C. The process 
results in a substrate having a surface with a plurality of 
polymers of the following general formula: 

55 

SJLKMMMyKM*) . . . (MxHQ 

where square brackets indicate optional groups, and M/ 
. . . M x indicates any sequence of monomers. The num- 
ber of monomers could cover a wide variety of values, 60 
but in a preferred embodiment they will range from 2 to 
100. 

In some embodiments a plurality of locations on the 
substrate polymers are to contain a common monomer 
subsequence. For example, it may be desired to synthe- 65 
size a sequence S-M1-M2-M3 at first locations and a 
sequence S-M4-M2-M3 at second locations. The process 
would commence with irradiation of the first locations 
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followed by contacting with Mi-P, resulting in the se- 
quence S-Mi-P at the first location. The second loca- 
tions would then be irradiated and contacted with 
M4-P, resulting in the sequence S-M4-P at the second 
locations. Thereafter both the first and second locations 
would be irradiated and contacted with the dimer M2- 
M3, resulting in the sequence S-M1-M2-M3 at the first 
locations and S-M4-M2-M3 at the second locations. Of 
course, common subsequences of any length could be 
utilized including those in a range of 2 or more mono- 
mers, 2 to 100 monomers, 2 to 20 monomers, and a most 
preferred range of 2 to 3 monomers. 

According to other embodiments, a set of masks is 
used for the first monomer layer and, thereafter, varied 
light wavelengths are used for selective deprotection. 
For example, in the process discussed above, first re- 
gions are first exposed through a mask and reacted with 
a first monomer having a first protective group Pi, 
which is removable upon exposure to a first wavelength 
of light (e.g., IR). Second regions are masked and re- 
acted with a second monomer having a second prote- 
cive group P2, which is removable upon exposure to a 
second wavelength of light (e.g., UV). Thereafter, 
masks become unnecessary in the synthesis because the 
entire substrate may be exposed alternatively to the first 
and second wavelengths of light in the deprotection 
cycle. 

The polymers prepared on a substrate according to 
the above methods will have a variety of uses including, 
for example, screening for biological activity. In such 
screening activities, the substrate containing the sequen- 
ces is exposed to an unlabeled or labeled receptor such 
as an antibody, receptor on a cell, phospholipid vesicle, 
or any one of a variety of other receptors. In one pre- 
ferred embodiment the polymers are exposed to a first, 
unlabeled receptor of interest and, thereafter, exposed 
to a labeled receptor-specific recognition element, 
which is, for example, an antibody. This process will 
provide signal amplification in the detection stage. 

The receptor molecules may bind with one or more 
polymers on the substrate. The presence of the labeled 
receptor and, therefore, the presence of a sequence 
which binds with the receptor is detected in a preferred 
embodiment through the use of autoradiography, detec- 
tion of fluorescence with a charge-coupled device, fluo- 
rescence microscopy, or the like. The sequence of the 
polymer at the locations where the receptor binding is 
detected may be used to determine all or part of a se- 
quence which is complementary to the receptor. 

Use of the invention herein is illustrated primarily 
with reference to screening for biological activity. The 
invention will, however, find many other uses. For 
example, the invention may be used in information stor- 
age (e.g., on optical disks), production of molecular 
electronic devices, production of stationary phases in 
separation sciences, production of dyes and brightening 
agents, photography, and in immobilization of cells, 
proteins, lectins, nucleic acids, polysaccharides and the 
like in patterns on a surface via molecular recognition of 
specific polymer sequences. By synthesizing the same 
compound in adjacent, progressively differing concen- 
trations, a gradient will be established to control chemo- 
taxis or to develop diagnostic dipsticks which, for ex- 
ample, titrate an antibody against an increasing amount 
of antigen. By synthesizing several catalyst molecules in 
close proximity, more efficient multistep conversions 
may be achieved by "coordinate immobilization." Co- 
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ordinate immobilization also may be used for electron completed substrate to interact freely with molecules 
transfer systems, as well as to provide both structural exposed to the substrate. The linker molecules should 
integrity and other desirable properties to materials be 6-50 atoms long to provide sufficient exposure. The 
such as lubrication, wetting, etc. linker molecules may be, for example, aryl acetylene, 

According to alternative embodiments, molecular 5 ethylene glycol oligomers containing 2-10 monomer 
biodistribution or pharmacokinetic properties may be . units, diamines, diacidsj amino acids, or combinations 
examined. For example, to assess resistance to intestinal thereof. Other linker molecules may be used in light of 
or serum proteases, polymers may be capped with a this disclsoure. 

fluorescent tag and exposed to biological fluids of inter- According to alternative embodiments, the linker 
est. 10 molecules are selected based upon their hydrophilic/- 

III. Polymer Synthesis hydrophobic properties to improve presentation of syn- 

FIG. 1 illustrates one embodiment of the invention thesized polymers to certain receptors. For example, in 
disclosed herein in which a substrate 2 is shown in t h e case of a hydrophilic receptor, hydrophilic linker 
cross-section. Essentially, any conceivable substrate molecules will be preferred so as to permit the receptor 
may be employed in the invention. The substrate may 15 t0 more closely a p proac h the synthesized polymer, 
be biological, nonbiological, organic, inorganic, or a According to another alternative embodiment, linker 
combination of any of these, existing as particles, mo i eC ules are also provided with a photocleavable 
strands, precipitates, gels, sheets tubing, spheres, con- at m mtenned j at e position. The photocleavable 

miners capillaries, pads, sbces, films, plates, slides, etc. h preferably cleavab i e at a wavelength different 

The substrate may have any convenient shape, such as a 20 frQm ^ tective ^ enables rem0 val of the 

disc square, sphere circle, etc. The substrate is prefera- variQUS j fol]ow £ comple tion of the synthesis 
bly flat but may take on a variety of alternative surface b ^ e e to | e s&teat wavelengths of. 
configurations. For example, the substrate may contain ■ / r 

raised or depressed regions on which the synthesis takes *v/ ... . , v i_ j * *t- , A , 
i m. u** j-* -r r uif » The linker molecules can be attached to the substrate 
place. The substrate and its surface preferably form a 25. , , . , . - f / . 

rigid support on which to carry out the reactions de- ™ carbon-carbon bonds usmg, for examp e, froly in- 
scribed herein. The substrate and its surface is also Auorochloroethylene surfaces or preferably, by silox- 
chosen to provide appropriate light-absorbing charac- ane bonds (usmg, for example, glass or sihcon oxide 
teristics. For instance, the substrate may be a polymer- surfaces >' S u Joxane b ° nds ^ . of ? ub " 

ized Langmuir Blodgett film, functionalized glass, Si, 30 st ™ te , mav be foime ? m ? ne e ^? daa ? x \ ™ reactions 
Ge, GaAs, GaP, S1O2, SIN4, modified silicon, or any of hnker molecules bearing tnchlorosilyrgroups. The 
oneofawide variety of gels or polymers such as (poly> hhS ^ r molecules may optionally be attached m an or- 
tetrafluoroethylene, (poly)vinyUdenedifluoride, poly- dere . d ^ Ie " « P* 3 ** of the head ^"P* m a P olv - 
styrene, polycarbonate, or combinations thereof. Other menzed Langmuir Blodgett film. In alternative embodi- 
substrate materials will be readily apparent to those of 35 ments » the ^"Iksr molecules are adsorbed to the surface 
skill in the art upon review of this disclosure. In a pre- °^ tne substrate. 

ferred embodiment the substrate is flat glass or single- The hnker molecules and monomers used herein are 
crystal silicon with surface relief features of less than 10 provided with a functional group to which is bound a 
A. protective group. Preferably, the protective group is on 

According to some embodiments, the surface of the 40 ^ distal or terminal end of the linker molecule oppo- 
substrate is etched using well known techniques to pro- s * te ^ e substrate. The protective group may be either a 
vide for desired surface features. For example, by way negative protective group (i.e., the protective group 
of the formation of trenches, v-grooves, mesa struc- renders the linker molecules less reactive with a mono- 
tures, or the like, the synthesis regions may be more nier upon exposure) or a positive protective group (i.e., 
closely placed within the focus point of impinging light, 45 the protective group renders the linker molecules more 
be provided with reflective "mirror** structures for reactive with a monomer upon exposure). In the case of 
maximization of light collection from fluorescent negative protective groups an additional step of reacti- 
sources, or the like. vation will be required. In some embodiments, this will 

Surfaces on the solid substrate will usually, though be done by heating, 
not always, be composed of the same material as the 50 The protective group on the linker molecules may be 
substrate. Thus, the surface may be composed of any of selected from a wide variety of positive light-reactive 
a wide variety of materials, for example, polymers, groups preferably including nitro aromatic compounds 
plastics, resins, polysaccharides, silica or silica-based such as o-nitrobenzyl derivatives or benzylsulfonyl. In a 
materials, carbon, metals, inorganic glasses, membranes, preferred embodiment, 6-nitroveratryloxycarbonyl 
or any of the above-listed substrate materials. In some 55 (NVOQ, 2-nitrobenzyloxycarbonyl (NBOC) or a,a- 
embodiments the surface may provide for the use of dimethyl-dimethoxybenzyloxycarbonyl (DDZ) is used, 
caged binding members which are attached firmly to In one embodiment; a nitro aromatic compound con- 
the surface of the substrate. Preferably, the surface will taming a benzylic hydrogen oxtho to the nitro group is 
contain reactive groups, which could be carboxyl, used, Le., a chemical of the form: 
amino, hydroxyl, or the like. Most preferably, the sur- 60 
face will be optically transparent and will have surface 
Si — OH functionalities, such as are found on silica sur- 
faces. 

The surface 4 of the substrate is preferably provided 
with a layer of linker molecules 6, although it will be 65 
understood that the linker molecules are not required 
elements of the invention. The linker molecules are 
preferably of sufficient length to permit polymers in a 
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where K\ is alkoxy, alkyl, halo, aryl, alkenyl, or hydro- comprise a molecule which is decomposed by light such 

gen; R2 is alkoxy, alkyl, halo, aryl, nitro, or hydrogen; as quinone diazide or a material which is transiently 

R3 is alkoxy, alkyl, halo, nitro, aryl, or hydrogen; R4 is bleached at the wavelength of interest. Transient 

alkoxy, alkyl, hydrogen, aryl, halo, or nitro; and R5 is bleaching of materials will allow greater penetration 

alkyl, alkynyl, cyano, alkoxy, hydrogen, halo, aryl, or 5 where light is applied, thereby enhancing contrast. Al- 

alkenyl. Other materials which may be used include teraatively, contrast enhancement may be provided by 

o-hydroxy-a-methyl cinnamoyl derivatives. Photore- way of a cladded fiber optic bundle, 

movable protective groups are described in, for exam- The light may be from a conventional incandescent 

pie, Patchornik, /. Am. Chem. Soc. (1970) 92:6333 and source, a laser, a laser diode, or the like. If non-col- 

Amit et al., /. Org. Chem. (1974) 39:192, both of which 10 hmated sources of light are used it may be desirable to 

are incorporated herein by reference. provide a thick- or multi-layered mask to prevent 

In an alternative embodiment the positive reactive spreading of the light onto the substrate. It may, further, 

group is activated for reaction with reagents in solution. ^ desirable in some embodiments to utilize groups 

For example, a 5-bromo-7-nitro indoline group, when which are sensitive to different wavelengths to control 

bound to a carbonyl, undergoes reaction upon exposure 15 syn thesis. For example, by using groups which are sen- 

to light at 420 nm. sitive to different wavelengths, it is possible to select 

In a second alternative embodiment, the reactive branch positions in the synthesis of a polymer or elimi- 

group on the linker molecule is selected from a wide nate certain mas king steps. Several reactive groups 

variety of negative light-reactive groups including a £ Qng ^ their corresponding wavelengths for depro- 

cmammate group. _ .... 20 tection are provided in Table 1. 

Alternatively, the reactive group is activated or deac- 
tivated by electron beam lithography, x-ray lithogra- 
phy, or any other radiation. Suitable reactive groups for 
electron beam lithography include sulfonyl. Other 
methods may be used including, for example, exposure 25 
to a current source. Other reactive groups and methods 
of activation may be used in light of this disclosure. 

As shown in FIG. 1, the linking molecules are prefer- 
ably exposed to, for example, light through a suitable 



TABLE 1 




Approximate 




Group 


Deprotection 


Wavelength 


Nitroveratryloxy carbonyl (NVOC) 


UV (30CM00 


nm) 


Nitrobenzyloxy carbonyl (NBOQ 


UV (300-350 


nm) 


Dimethyl dimethoxybenzyloxy 


UV (280-300 


nm) 


carbonyl 






5-Bromo-7-nitroffidoIinyl 


UV (420 nm) 




o-Hydroxy-a-methyl cinnamoyl 


UV (300-350 


nm) 


2-Oxymethylene anthraquinone 


UV (350 nm) 





known in the semiconductor industry and described in, 

for example Sze, VLSI Technology, McGraw-Hill while the invention is illustrated primarily herein by 

(1983), and Mead et al Introduction to VLSI Sjstems, of ^ ^ of a mask to mvmialiUt selected regions 

Addison-Wesley (1980), wmch^are incorporated herem ^ other techniques may also be used. For 

by reference for all purposes. The light may be directed 35 g , ^ fee „ nder a modu . 

at either the surface contanung the protective groups or or &odc u ht source . Such techniq ues are 

at the back of tihe substrate so long as the substote is ^.^^ for ^ , u s pat Nq 4 m 615 

transparent to the wavelength of light neededfor re- which rated herein by refer . 

moval of the protective groups. In the embodiment T n *■ u \. , . _ ' _ . 

u . % ,* -r * ^ -r At\ ence. In alternative embodiments a laser galvanometnc 

shown m FIG. 1, light is directed at the surface of the 40 , T Al _ . ^ - 

substrate containinglme protective groups. FIG. 1 illus- » "J*™* ^ 0 * er embotoents, the symhesi 

trates the use of such masking techniques as they are ™* P lac , e « °' m c ° ntact ™* a ^ntional 

applied to a positive reactive group so as to activate ^ £ eferTed t0 * erem 35 a . Jf ' va l v * > or 

linking molecules and expose functional groups in areas f?ber optic light sources. By appropriately modulating 

10a and 106 45 liquid crystals, light may be selectively controlled so as 

The mask 8 is in one embodiment a transparent sup. to P™« U*t to contact selected regions of the sub- 

port material selectively coated with a layer of opaque strate Alternately, synthesis may take place on the 

material. Portions of the opaque material are removed, *nd ° f a «?» of optical fibers to which light is selec- 

leaving opaque material in the precise pattern desired tivel V «W>«d- 0ther means of controlling the location 

on the substrate surface. The mask is brought into close 50 of ^ exposure will be apparent to those of skill in the 

proximity with, imaged on, or brought directly into art. . , . 

contact with the substrate surface as shown in FIG. 1. The substrate may be irradiated either in contact or 

"Openings" in the mask correspond to locations on the not » contact with a solution (not shown) and is, prefer- 

substrate where it is desired to remove photoremovable ably, irradiated in contact with a solution. The solution 

protective groups from the substrate. Alignment may be 55 contains reagents to prevent the by-products formed by 

performed using conventional alignment techniques in irradiation from interfering with synthesis of the poly- 

which alignment marks (not shown) are used to accu- mer according to some embodiments. Such by-products 

rately overlay successive masks with previous pattern- might include, for example, carbon dioxide, nitrosocar- 

ing steps, or more sophisticated techniques may be used. bonyl compounds, styrene derivatives, indole deriva- 

For example, interferometric techniques such as the one 60 tives, and products of their photochemical reactions, 

described in Flanders et al., "A New Interferometric Alternatively, the solution may contain reagents used to 

Alignment Technique," App. Phys. Lett. (1977) match the index of refraction of the substrate. Reagents 

31:426-428, which is incorporated herein by reference, added to the solution may further include, for example, 

may be used. acidic or basic buffers, thiols, substituted hydrazines and 

To enhance contrast of light applied to the substrate, 65 hydroxylamines, reducing agents (e.g., NADH) or rea- 

it is desirable to provide contrast enhancement materials gents known to react with a given functional group 

between the mask and the substrate according to some (e.g., aryl nitroso-fglyoxylic acid— ►aryl formhydrox- 

embodiments. This contrast enhancement layer may amate-hCCh). 
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Either concurrently with or after the irradiation step, According to some embodiments, several sequences 
the linker molecules are washed or otherwise contacted are intentionally provided within a single region so as to 
with a first monomer, illustrated by "A" in regions 12a provide an initial screening for biological activity, after 
and 12b in FIG. 2. The first monomer reacts with the which materials within regions exhibiting significant 
activated functional groups of the linkage molecules 5 binding are further evaluated, 
which have been exposed to light. The first monomer, IV. Details of One Embodiment of a Reactor System 
which is preferably an amino acid, is also provided with FIG. 8A schematically illustrates a preferred embodi- 
a photoprotective group. The photoprotective group ment of a reactor system 100 for synthesizing polymers 
on the monomer may be the same as or different than on the prepared substrate in accordance with one aspect 
the protective group used in the linkage molecules, and 10 of the invention. The reactor system includes a body 
may be selected from any of the above-described pro- 102 with a cavity 104 on a surface thereof. In preferred 
tective groups. In one embodiment, the protective embodiments the cavity 104 is between about 50 and 
groups for the A monomer is selected from the group 1000 fim deep with a depth of about 500 urn preferred. 
NBOC and NVOC. The bottom of the cavity is preferably provided with 

As shown in FIG. 3, the process of irradiating is 15 an array of ridges 106 which extend both into the plane 
thereafter repeated, with a mask repositioned so as to of the Figure and parallel to the plane of the Figure, 
remove linkage protective groups and expose functional The ridges are preferably about 50 to 200 urn deep and 
groups in regions 14a and 146 which are illustrated as spaced at about 2 to 3 mm. The purpose of the ridges is 
being regions which were protected in the previous to generate turbulent flow for better mixing. The bot- 
masking step. As an alternative to repositioning of the 20 torn surface of the cavity is preferably light absorbing so 
first mask, in many embodiments a second mask will be as to prevent reflection of impinging light 
utilized. In other alternative embodiments, some steps A substrate 112 is mounted above the cavity 104. The 
may provide for illuminating a common region in sue- substrate is provided along its bottom surface 114 with 
cessive steps. As shown in FIG. 3, it may be desirable to a photoremovable protective group such as NVOC 
provide separation between irradiated regions. For ex- 25 with or without an intervening linker molecule. The 
ample, separation of about 1-5 u.m may be appropriate substrate is preferably transparent to a wide spectrum of 
to account for alignment tolerances. light, but in some embodiments is transparent only at a 

As shown in FIG. 4, the substrate is then exposed to wavelength at which the protective group may be re- 
a second protected monomer "B," producing B regions moved (such as UV in the case of NVOC). The sub- 
16a and 16b. Thereafter, the substrate is again masked so 30 strate in some embodiments is a conventional micro- 
as to remove the protective groups and expose reactive scope glass slide or cover slip. The substrate is prefera- 
groups on A region 12a and B region 16b. The substrate bly as thin as possible, while still providing adequate 
is again exposed to monomer B, resulting in the forma- physical support. Preferably, the substrate is less than 
tion of the structure shown in FIG. 6. The dimers B-A about 1 mm thick, more preferably less than 0.5 mm 
and B-B have been produced on the substrate. 35 thick, more preferably less than 0. 1 mm thick, and most 

A subsequent series of masking and contacting steps preferably less than 0.05 mm thick. In alternative pre- 
similar to those described above with A (not shown) ferred embodiments, the substrate is quartz or silicon, 
provides the structure shown in FIG. 7. The process The substrate and the body serve to seal the cavity 
provides all possible dimers of B and A, Le., B-A, A-B, except for an inlet port 108 and an outlet port 110. The 
A-A, and B^B. 40 body and the substrate may be mated for sealing in some 

The substrate, the area of synthesis, and the area for embodiments with one or more gaskets. According to a 
synthesis of each individual polymer could be of any preferred embodiment, the body is provided with two 
size or shape. For example, squares, ellipsoids, rectan- concentric gaskets and the intervening space is held at 
gles, triangles, circles, or portions thereof, along with vacuum to ensure mating of the substrate to the gaskets, 
irregular geometric shapes, may be utilized. Duplicate 45 Fluid is pumped through the inlet port into the cavity 
synthesis areas may also be applied to a single substrate by way of a pump 116 which may be, for example, a 
for purposes of redundancy. model no. B-120-S made by Eldex Laboratories. Se- 

In one embodiment the regions 12a, 12* and 16a, 16a lected fluids are circulated into the cavity by the pump, 
on the substrate will have a surface area of between through the cavity, and out the outlet for recirculation 
about 1 cm 2 and 10- 10 cm 2 . In some embodiments the 50 or disposal. The reactor may be subjected to ultrasonic 
regions 12a, 12b and 16a, 16b have areas of less than radiation and/or heated to aid in agitation in some em- 
about 10- 1 cm 2 , lO- 2 cm 2 , 10- 3 cm 2 , 10--* cm 2 , 10- 5 bodiments. 

cm 2 , 10~ 6 cm 2 10~ 7 cm 2 , 10-8 cm 2 , or \q- io cm 2. in a Above the substrate 112, a lens 120 is provided which 
preferred embodiment, the regions 12a, 12b and 16a, may be, for example, a 2" 100 mm focal length fused 
16b are between about 10x10 /im and 500 X 500 um 55 silica lens. For the sake of a compact system, a reflective 
In some embodiments a single substrate supports mirror 122 may be provided for directing light from a 
more than about 10 different monomer sequences and light source 124 onto the substrate. Light source 124 
perferably more than about 100 different monomer may be, for example, a Xe(Hg) light source manufac- 
sequences, although in some embodiments more than tured by Oriel and having model no. 66024. A second 
about 10*, 10 4 , 10 5 , 10 6 , 10 7 , or 10* different sequences 60 lens 126 may be provided for the purpose of projecting 
are provided on a substrate. Of course, within a region a mask image onto the substrate in combination with 
of the substrate in which a monomer sequence is synthe- lens 120. This form of lithography is referred to herein 
sized, it is preferred that the monomer sequence be as projection printing. As will be apparent from this 
substantially pure. In some embodiments, regions of the disclosure, proximity printing and the like may also be 
substrate contain polymer sequences which are at least 65 used according to some embodiments, 
about 1%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, Light from the light source is permitted to reach only 
45%, 50%, 60%, 70%, 80%, 90%, 95%, 96%, 97% selected locations on the substrate as a result of mask 
98% or 99% pure. 128. Mask 128 may be, for example, a glass slide having 



etched chrome thereon. The mask 128 in one embodi- 
ment is provided with a grid of transparent locations 
and opaque locations. Such masks may be manufactured 
by, for example, Photo Sciences, Inc. Light passes 
. freely through the transparent regions of the mask, but 
is reflected from or absorbed by other regions. There- 
fore, only selected regions of the substrate are exposed 
to light 

As discussed above, light valves (LCD's) may be 
used as an alternative to conventional masks to selec- 10 
tively expose regions of the substrate. Alternatively, 
fiber optic faceplates such as those available from 
Schott Glass, Inc, may be used for the purpose of con- 
trast enhancement of the mask or as the sole means of 
restricting the region to which light is applied. Such 15 
faceplates would be placed directly above or on the 
substrate in the reactor shown in FIG. 8A. In still fur- 
ther embodiments, flys-eye lenses, tapered fiber optic 
faceplates, or the like, may be used for contrast en- 
hancement 20 

In order to provide for illumination of regions smaller 
than a wavelength of light, more elaborate techniques 
may be utilized. For example, according to one pre- 
ferred embodiment, light is directed at the substrate by 
way of molecular microcrystals on the tip of, for exam- 25 
pie, micropipettes. Such devices are disclosed in Lieber- 
man et al., "A Light Source Smaller Than the Optical 
Wavelength," Science (1990) 247:59-61, which is incor- 
porated herein by reference for all purposes. 

In operation, the substrate is placed on the cavity and 30 
sealed thereto. All operations in the process of prepar- 
ing the substrate are carried out in a room lit primarily 
or entirely by light of a wavelength outside of the light 
range at which the protective group is removed. For 
example, in the case of NVOC, the room should be lit 35 
with a conventional dark room light which provides 
little or no UV light All operations are preferably con- 
ducted at about room temperature. 

A first, deprotection fluid (without a monomer) is 
circulated through the cavity. The solution preferably is 40 
of 5 mM sulfuric acid in dioxane solution which serves 
to keep exposed amino groups protonated and decreases 
their reactivity with photolysis by-products. Absorp- 
tive materials such as N,N-diethylamino 2,4-dinitroben- 
zene, for example, may be included in the deprotection 45 
fluid which serves to absorb light and prevent reflection 
and unwanted photolysis. 

The slide is, thereafter, positioned in a light raypath 
from the mask such that first locations on the substrate 
are illuminated and, therefore, deprotected. In pre- 50 
ferred embodiments the substrate is illuminated for be- 
tween about 1 and 15 minutes with a preferred illumina- 
tion time of about 10 minutes at 10-20 mW/cm 2 with 
365 nm light The slides axe neutralized (i.e., brought to 
a pH of about 7) after photolysis with, for example, a 55 
solution of di-isopropylethylamine (DEEA) in methy- 
lene chloride for about S minutes. 

The first monomer is then placed at the first locations 
on the substrate. After irradiation, the slide is removed, 
treated in bulk, and then reinstalled in the flow cell. 60 
Alternatively, a fluid containing the first monomer, 
preferably also protected by a protective group, is cir- 
culated through the cavity by way of pump 116. If, for 
example, it is desired to attach the amino acid Y to the 
substrate at the first locations, the amino acid Y (bearing 65 
a protective group on its a-nitrogen), along with rea- 
gents used to render the monomer reactive, and/or a 
carrier, is circulated from a storage container 118, 
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through the pump, through the cavity, and back to the 
inlet of the pump. 

The monomer carrier solution is, in a preferred em- 
bodiment, formed by mixing of a first solution (referred 
5 to herein, as solution "A") and a second solution (re- 
ferred to herein as solution "B"). Table 2 provides an 
illustration of a mixture which may be used for solution 
A. 

TABLE 2 

Representati ve Monomer Carrier Solution "A" 

100 mg NVOC amino protected amino acid 
37 mg HOBT (I-Hydroxybenzotriazole) 
250 jil DMF <Dimethylforaiaraide) 
86 p.1 DIE A (Diisopropylethylaminc) 



The composition of solution B is illustrated in Table 
3. Solutions A and B are mixed and allowed to react at 
room temperature for about 8 minutes, then diluted 
with 2 ml of DMF, and 500 jil are applied to the surface 
of the slide or the solution is circulated through the 
reactor system and allowed to react for about 2 hours at 
room temperature. The slide is then washed with DMF, 
methylene chloride and ethanol. 

TABLE 3 

Representative Monomer Carrier Solution **B" 
250 fH DMF 

1 11 mg BOP (Beiuotriazolyl-n^xy-tris(dimemylamino) 
phosphoniumheaaflnorophosphate) 



As the solution containing the monomer to be at- 
tached is circulated through the cavity, the amino acid 
or other monomer will react at its carboxy terminus 
with amino groups on the regions of the substrate which 
have been deprotected. Of course, while the invention 
is illustrated by way of circulation of the monomer 
through the cavity, the invention could be practiced by 
way of removing the slide from the reactor and sub- 
mersing it in an appropriate monomer solution. 

After addition of the first monomer, the solution 
containing the first amino acid is then purged from the 
system. After circulation of a sufficient amount of the 
DMF/methylene chloride such that removal of the 
amino acid can be assured (e.g., about 50 X times the 
volume of the cavity and carrier lines), the mask or 
substrate is repositioned, or a new mask is utilized such 
that second regions on the substrate will be exposed to 
light and the light 124 is engaged for a second exposure. 
This will deprotect second regions on the substrate and 
the process is repeated until the desired polymer se- 
quences have been synthesized. 

The entire derivatized substrate is then exposed to a 
receptor of interest, preferably labeled with, for exam- 
ple, a fluorescent marker, by circulation of a solution or 
suspension of the receptor through the cavity or by 
contacting the surface of the slide in bulk. The receptor 
will preferentially bind to certain regions of the sub- 
strate which contain complementary sequences. 

Antibodies are typically suspended in what is com- 
monly referred to as "supercocktail," which may be, for 
example, a solution of about \% BSA (bovine serum 
albumin), 0.5% TweenTM non-ionic detergent in PBS 
(phosphate buffered saline) buffer. The antibodies are 
diluted into the supercocktail buffer to a final concen- 
tration of, for example, about 0.1 to 4 u,g/ml. 

FIG. 8B illustrates an alternative preferred embodi- 
ment of the reactor shown in FIG. 8A. According to 



r 



5,445,934 

19 20 

this embodiment, the mask 12& is placed directly in pled; followed by a third mask, for the C column; and a 

contact with the substrate. Preferably, the etched por- final mask that exposes the right-most column, for D. 

tion of the mask is placed face down so as to reduce the The first, second, third, and fourth masks may be a 

effects of light dispersion. According to this embodi- single mask translated to different locations, 

ment, the imaging lenses 120 and 126 are not necessary 5 The process is repeated in the horizontal direction for 

because the mask is brought into close proximity with the second unit of the dimer. This time, the masks allow 

the substrate. exposure of horizontal rows, again 0.25 cm wide. A, B, 

For purposes of increasing the signal-to-noise ratio of C, and D are sequentially coupled using masks that 

the technique, some embodiments of the invention pro- expose horizontal fourths of the reaction area. The 

vide for exposure of the substrate to a first labeled or 10 resulting substrate contains all 16 dinucleotides of four 

unlabeled receptor followed by exposure of a labeled, bases. 

second receptor (e.g., an antibody) which binds at mul- The eight masks used to synthesize the dinucleotide 
tiple sites on the first receptor. If, for example, the first are related to one another by translation or rotation. In 
receptor is an antibody derived from a first species of an fact, one mask can be used in all eight steps if it is suit- 
animal, the second receptor is an antibody derived from 15 ably rotated and translated. For example, in the example 
a second species directed to epitopes associated with the above, a mask with a single transparent region could be 
first species. In the case of a mouse antibody, for exam- sequentially used to expose each of the vertical col- 
ple, fluorescently labeled goat antibody or antiserum umns, translated 90*, and then sequentially used to 
which is antimouse may be used to bind at multiple sites allow exposure of the horizontal rows, 
on the mouse antibody, providing several times the 20 Tables 4 and 5 provide a simple computer program in 
fluorescence compared to the attachment of a single Quick Basic for planning a masking program and a 
mouse antibody at each binding site. This process may sample output, respectively, for the synthesis of a poly- 
be repeated again with additional antibodies (e.g., goat- mer chain of three monomers ("residues") having three 
mouse-goat, etc.) for further signal amplification. different monomers in the first level, four different mon- 

In preferred embodiments an ordered sequence of 25 omers in the second level, and five different monomers 

masks is utilized. In some embodiments it is possible to in the third level in a striped pattern. The output of the 

use as few as a single mask to synthesize all of the possi- program is the number of cells, the number of "stripes" 

ble polymers of a given monomer set. (light regions) on each mask, and the amount of transla- 

If, for example, it is desired to synthesize all 16 dinu- tion required for each exposure of the mask. 

TABLE 4 



Mask Strategy Program 



DEFINT A-Z 

DIMb(20), wQ0) ( 1(500) 
FS = "LPT1:** 

OPEN fS FOR OUTPUT AS #1 
jmax — 3 'Number of residues 

b(l) = 3: b(2) = 4: b(3) = 5 "Number of building blocks for res 1,2,3 
g = 1: lmax(l) = I 

FOR j = lTOjmax:g= g*b(j):NEXTj 
w(0) = 0: w(l) = g / b(l) 

PRINT #1, "MASKZBAS DATES, TIMES: PRINT #1, 
PRINT #1, USrNG "Number of residues = jmax 
FORj = ITOjmax 

PRINT #1, USING 44 Residue ## ## btifldmg blocks"; j; b(D 
NEXT j 
PRINT #1, ** 

PRINT #1, USING "Number of cells= g: PRINT #1, 

FOR j = 2 TO jmax 

ImazG) = imaxG - 1) • b<j - 1) 

w(j) = w(j - l)/b0) 

NEXT j 

FORj = lTOjmax 

PRINT #1, USING "Mask for residue j: PRINT #1, 
PRINT #1, USING . Number of stripes=### M ; lmax(D 
PRINT #1, USING " Width of each stripe wG") 
FOR 1 = 1 TO ImaxO) 
a = 1 + (1 - 1) • w(j - 1) 
ae = a + w(j) — 1 

PRINT #1. USING - Stripe ## begins at location ### and ends at 1; a; ae 

NEXT 1 
PRINT #1, 

PRINT # I, USINiG " For each of building blocks, translate mask by 
cellar; bO); w(D, 

PRINT #1, : PRINT #1, : PRINT #1, 

NEXT j 



<§) Copyright 19901 Aflyma* Research Institute 

cleotides from four bases, a 1 cm square synthesis region 

is divided conceptually into 16 boxes, each 0.25 cm TABLE 5 

wide. Denote the four monomer units by A, B, C, and Masking Strategy Ou^uT 

D. The first reactions are carried out in four vertical 65 : 2 — - — 

columns, each 0.25 cm wide. The first mask exposes the ^£7^ 3 building blocks 
left-most column of boxes, where A is coupled. The Residue 2 4 building blocks 

second mask exposes the next column, where B is cou- Residue 3 5 building blocks 
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Masking Strategy Output 



Number of cells = 60 

Mask for residue I 

Number of stripes = 1 

Width of each stripe = 20 

Stripe 1 begins at location 1 and ends at 20 

For each of 3 building blocks, translate mask by 20 ccll(s) 

Mask for residue 2 

Number of stripes= 3 

Width of each stripe = 5 

Stripe 1 begins at location 1 and ends at 5 

Stripe 2 begins at location 21 and ends at 25 

Stripe 3 begins at location 41 and ends at 45 

For each of 4 building blocks, translate mask by 5 cell(s) 

Mask for residue 3 

Number of stripes = 12 

Width of each stripe = 1 

Stripe 1 begins at location 1 and ends at 1 

Stripe 2 begins at location 6 and ends at 6 

Stripe 3 begins at location 1 1 and ends at 1 1 

Stripe 4 begins at location 16 and ends at 16 

Stripe 5 begins at location 21 and ends at 21 

Stripe 6 begins at location 26 and ends at 26 

Stripe 7 begins at location 31 and ends at 31 

Stripe 8 begins at location 36 and ends at 36 

Stripe 9 begins at location 41 and ends at 41 

Stripe 10 begins at location 44 and ends at 46 

Stripe 11 begins at location 51 and ends at 51 

Stripe 12 begins at location 56 and ends at 56 

For each of 5 building blocks, translate mask by 1 ceil(s) 

<g) Copyright 1990; Affymax Research Institute 
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aperture plate 211 may be, for example, a model no. 
477352/477380 manufactured by Carl Zeiss. 

The fluoresced light then enters a photomultiplier 
tube 212 which in some embodiments is a model no. 
R943-02 manufactured by Hamamatsu, the signal is 
amplified in preamplifier 214 and photons are counted 
by photon counter 216. The number of photons is re- 
corded as a function of the location in the computer 204. 
Pre-Amp 214 may be, for example, a model no. SR440 
manufactured by Stanford Research Systems and pho- 
ton counter 216 may be a model no. SR400 manufac- 
tured by Stanford Research Systems. The substrate is 
then moved to a subsequent location and the process is 
repeated. In preferred embodiments the data aire ac- 
quired every 1 to 100 p,m with a data collection diame- 
ter of about 0.8 to 10 jim preferred. In embodiments 
with sufficiently high fluorescence, a CCD (change 
coupled device) detector with broadfield illumination is 
utilized. 

By counting the number of photons generated in a 
given area in response to the laser, it is possible to deter- 
mine where fluorescent marked molecules are located 
on the substrate. Consequently, for a slide which has a 
matrix of polypeptides, for example, synthesized on the 
25 surface thereof, it is possible to determine which of the 
polypeptides is complementary to a fluorescently 
marked receptor. 

According to preferred embodiments, the intensity 
and duration of the light applied to the substrate is con- 
30 trolled by varying the laser power and scan stage rate 
for improved signal-tp-noise ratio by maximizing fluo- 
rescence emission and minimizing background noise. 

While the detection apparatus has been illustrated 
primarily herein with regard to the detection of marked 
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V. Details of One Embodiment of A Fluorescent De- 
tection Device 

FIG. 9 iUustrates a fluorescent detection device for 
detecting fluorescently labeled receptors on a substrate. 

A substrate 112 is placed on an x/y translation table 202. „ _ _ 

In a preferred embodiment the x/y translation table is a 35 receptors, the invention* will find application in other 
model no. PM500-A1 manufactured by Newport Cor- ^ck. For example, the detection apparatus disclosed 
poration. The x/y translation table is connected to and herein could be used in the fields of catalysis, DNA or 
controlled by an appropriately programmed digital p rote in gel scanning, and theme, 
computer 204 which may be, for example, an appropri- V L Determination of Relative Binding Strength of 
ately programmed IBM PC/AT or AT compatible 43 Receptors 

computer. Of course, other computer systems, special jh e signal-to-noise ratio of the present invention is 
purpose hardware, or the like could readily be substi- sufficiently high that not only can the presence or ab- 
tuted for the AT computer used herein for illustration. sence of a receptor on a ligand be detected, but also the 
Computer software for the translation and data collec- relative binding affinity of receptors to a variety of 
tion functions described herein can be provided based 45 sequences can be determined. 

on commercially available software including, for ex- j n practice it is found that a receptor will bind to 
ample, "Lab Windows" licensed by National Instru- several peptide sequences in an array, but will bind 
ments, which is incorporated herein by reference for all much more strongly to some sequences than others, 
purposes. Strong binding affinity will be evidenced herein by a 

The substrate and x/y translation table are placed 50 strong fluorescent or radiographic signal since many 
under a microscope 206 which includes one or more receptor molecules will bind in a region of a strongly 
objectives 208. Light (about 488 am) from a laser 210, bound ligand. Conversely, a weak binding affinity will 
which in some embodiments is a model no. 2020-05 be evidenced by a weak fluorescent or radiographic 
argon ion laser manufactured by Spectraphysics, is di- signal due to the relatively small number of receptor 
rected at the substrate by a dichroic mirror 207 which 55 molecules which bind in a particular region of a sub- 
passes greater than about 520 nm light but reflects 488 strate having a ligand with a weak binding affinity for 
nm light Dichroic mirror 207 may be, for example, a the receptor. Consequently, it becomes possible to de- 
model no. FT510 manufactured by Carl Zeiss. Light termine relative binding avidity (or affinity in the case 
reflected from the mirror then enters the microscope of univalent interactions) of a ligand herein by way of 
206 which may be, for example, a model no. Axioscop 60 the intensity of a fluorescent or radiographic signal in a 
20 manufactured by Carl Zeiss. Fluorescein-marked region containing that ligand. 

materials on the substrate will fluoresce >488 nm light, Semiquantitative data on affinities might also be ob- 
and the fluoresced light will be collected by the micro- tained by varying washing conditions and concentra- 
scope and passed through the mirror. The fluorescent tions of the receptor. This would be done by compari- 
light from the substrate is then directed through a wave- 65 son to known ligand receptor pairs, for example, 
length filter 209 and, thereafter through an aperture VII. Examples 

plate 21L Wavelength filter 209 may be, for example, a The following examples are provided to illustrate the 
model no. OG530 manufactured by Melles Griot and efficacy of the inventions herein. All operations were 
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conducted at about ambient temperatures and pressures is to be added, with appropriate washes to remove 

unless indicated to the contrary. the by-products of the deprotection. 

A. Slide Preparation 2. Addition of a single activated and protected (with 
Before attachment of reactive groups it is preferred to the same photochemically-removable group) mon- 

clean the substrate which is, in a preferred embodiment 5 omer, which will react only at the sites addressed 

a glass substrate such as a microscope slide or cover in step 1, with appropriate washes to remove the 

slip. According to one embodiment the slide is soaked in excess reagent from the surface, 

an alkaline bath consisting of, for example, 1 liter of The above cycle is repeated for each member of the 

95% ethanoi with 120 ml of water and 120 grams of monomer set until each location on the surface has been 

sodium hydroxide for 12 hours. The slides are then 10 extended by one residue in one embodiment In other 

washed under running water and allowed to air dry, embodiments, several residues are sequentially added at 

and rinsed once with a solution of 95% ethanoi. one i ocation oe f ore moving on to the next location. 

The slides are then aminated with, for example, Cyde ^ generally be limited by the coupling 

aminopropyltriethoxysilane for the purpose of attach- reaction ratej now & short as 20 min in automated pep- 

ing amino groups to the glass surface on linker mole- 15 tide synthesizers . ste p is optionally followed by 

cules although any omega functionahzed silane could addition of a protec ting group to stabilize the array for 

also be used for this purpose. In one embo<hment 0.1 * h ^ Fof SQme of j ( 

a^opropyltnemoxysilane is u^d although solu- a f ma i deprotection of the entire surface (removal 

tions with concentrations from 10~~% to 10% may be . . . . . . ^ mw . 

used, with about 10-3% to 2% preferred. A 0.1% mix- 20 of photoprotec tive side chain groups) may be ^raL 

ture is prepared by adding to 100 ml of a 95% . ^Jf 1 ^ 

ethanol/5% water mixture, 100 microhters (ul) of * Provided ™th re^ons 22^4, 26, 28 30, 32 34, and 
arninopropyltriethoxysilane. The mixture is agitated at ^J^ns 30, 32, 34, and 36 are masked, as shown in 
about ambient temperature on a rotary shaker for about ™. 10B and the g^ass is irradiated and exposed to a 
5 minutes. 500 ul of this mixture is then applied to the 25 conVmg "A" (e.g., gly), with the resulting 
surface of one side of each cleaned slide. After 4 min- structure shown in FIG. IOC. Thereafter, regions 22, 
utes, the slides are decanted of this solution and rinsed 24. 26 > ^ 28 m masked, the glass is irradiated (as 
three times by dipping in, for example, 100% ethanoi. shown in FIG. 10D) and exposed to a reagent contain- 
After the plates dry, they are placed in a 1 10°-120 ft C. ing M B" (e.g., phe), with the resulting structure shown 
vacuum oven for about 20 minutes, and then allowed to 30 in FIG. 10E. The process proceeds, consecutively 
cure at room temperature for about 12 hours in an argon masking and exposing the sections as shown until the 
environment. The slides are then dipped into DMF structure shown in FIG. 10M is obtained. The glass is 
(dimethyifonnamide) solution, followed by a thorough irradiated and the terminal groups are, optionally, 
washing with methylene chloride. capped by acetylation. As shown, all possible trimers of 

The aminated surface of the slide is then exposed to 35 gly/phe are obtained, 
about 500 ul of, for example, a 30 niillimolar (mM) In this example, no side chain protective group re- 
solution of NVOC-GABA (gamma amino butyric acid) moval is necessary. If it is desired, side chain deprotec- 
NHS (N-hydroxysuccinimide) in DMF for attachment tion may be accomplished by treatment with ethanedi- 
of a NVOC-GABA to each of the amino groups. thiol and trifluoroacetic acid. 

The surface is washed with, for example, DMF, 40 In general, the number of steps needed to obtain a 

methylene chloride, and ethanoi. particular polymer chain is defined by: 

Any unreacted aminopropyl silane on the surface — 

that is, those amino groups which have not had the nxl 0) 
NVOC-GABA attached — are now capped with acetyl 

groups (to prevent further reaction) by exposure to a 1:3 45 where: 

mixture of acetic anhydride in pyridine for 1 hour. n=the number of monomers in the basis set of mono- 
Other materials which may perform this residual cap- mers, and 

ping function include trifluoroacetic anhydride, for- l=the number of monomer units in a polymer chain, 

micacetic anhydride, or other reactive acylating agents. Conversely, the synthesized number of sequences of 

Finally, the slides are washed again with DMF, methy- 50 length 1 will be: 
lene chloride, and ethanoi. 

B. Synthesis of Eight Trimers of "A" and "B" n'. (2) 
FIG. 10 illustrates a possible synthesis of the eight 

trimers of the two-monomer set: gly, phe (represented Of course, greater diversity is obtained by using 

by "A" and "B," respectively). A glass slide bearing 55 masking strategies which will also include the synthesis 

silane groups terminating in 6-nitroveratryloxycarboxa- of polymers having a length of less than 1. If, in the 

mide (NVOC-NH) residues is prepared as a substrate. extreme case, all polymers having a length less than or 

Active esters (pentafluorophenyl, OBt, etc.) of gly and equal to 1 are synthesized, the number of polymers syn- 

phe protected at the amino group with NVOC are pre- thesized will be: 
pared as reagents. While not pertinent to this example, if 60 

side chain protecting groups are required for the mono- n'+n' - 1 + . . . +nl (3) 
mer set, these must not be photoreactive at the wave- 
length of light used to protect the primary chain. The maximum number of lithographic steps needed 

For a monomer set of size n, nXl cycles are required will generally be n for each "layer" of monomers, i.e., 

to synthesize all possible sequences of length 1. A cycle 65 the total number of masks (and, therefore, the number 

consists of: of lithographic steps) needed will be nXl. The size of 

1. Irradiation through an appropriate mask to expose the transparent mask regions will vary in accordance 

the amino groups at the sites where the next residue with the area of the substrate available for synthesis and 
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the number of sequences to be formed. In general, the 
sue of the synthesis areas will be: 

size of synthesis areas =(A)/(Sequences) 

5 

where: 

A is the total area, available for synthesis; and 
Sequences is the number of sequences desired in the 
area. 

It will be appreciated by those of skill in the art that 1Q 
the above method could readily be used to simulta- 
neously produce thousands or millions of oligomers on 
a substrate using the photolithographic techniques dis- 
closed herein. Consequently, the method results in the 
ability to practically test large numbers of, for example, 15 
di, tri, tetra, penta, hexa, hepta, octapeptides, dodeca- 
peptides, or larger polypeptides (or correspondingly, 
polynucleotides). 

The above example has illustrated the method by way 
of a manual example. It will of course be appreciated 2Q 
that automated or semi-automated methods could be 
used. The substrate would be mounted in a flow cell for 
automated addition and removal of reagents, to mini- 
mize the volume of reagents needed, and to more care- 
fully control reaction conditions. Successive masks ^ 
could be applied manually or automatically. 

Synthesis of a Dimer of an Aminopropyl Group and 
a Fluorescent Group 

In synthesizing the dimer of an aminopropyl group 
and a fluorescent group, a functionalized durapore 
membrane was used as a substrate. The durapore mem- 
brane was a polyvinylidine difluoride with aminopropyl 
groups. The aminopropyl groups were protected with 
the DDZ group by reaction of the carbonyl chloride 
with the amino groups, a reaction readily known to 
those of skill in the art. The surface bearing these 35 
groups was placed in a solution of THF and contacted 
with a mask bearing a checkerboard partem of 1 mm 
opaque and transparent regions. The mask was exposed 
to ultraviolet light having a wavelength down to at least 
about 280 nm for about 5 minutes at ambient tempera- 40 
ture, although a wide range of exposure times and tem- 
peratures may be appropriate in various embodiments 
of the invention- For example, in one embodiment, an 
exposure time of between about 1 and 5000 seconds may 
be used at process temperatures of between —70° and * 5 
-t-50 ? C 

In one preferred embodiment, exposure times of be- 
tween about 1 and 500 seconds at about ambient pres- 
sure are used. In some preferred embodiments, pressure 
» above ambient is used to prevent evaporation. 50 

The surface of the membrane was then washed for 
. about 1 hour with a fluorescent label which included an 
active ester bound to a chelate of a lanthanide. Wash 
times will vary over a wide range of values from about 
a few minutes to a few hours. These materials fluoresce 55 
in the red and the green visible region. After the reac- 
tion with the active ester in the fluorophore was com- 
plete, the locations in which the fluorophore was bound 
could be visualized by exposing them to ultraviolet light 
and observing the red and the green fluorescence. It 60 
was observed that the derivatized regions of the sub- 
strate closely corresponded to the original pattern of 
the mask. 

D. Demonstration of Signal Capability 

Signal detection capability was demonstrated using a 65 
low-level standard fluorescent bead kit manufactured 
by Flow Cytometry Standards and having model no. 
824. This kit includes 5.8 ftm diameter beads, each im- 
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pregnated with a known number of fluorescein mole- 
cules. 

One of the beads was placed in the illumination field 
on the scan stage as shown in FIG. 9 in a field of a laser 
spot which was initially shuttered. After being posi- 
tioned, in the illumination field, the photon detection 
equipment was turned on. The laser beam was un- 
blocked and it interacted with the particle bead, which 
then fluoresced. Fluorescence curves of beads impreg- 
nated with 7,000 and 13,000; fluorescein molecules, are 
shown in FIGS. 11 A and 11B respectively. On each 
curve, traces for beads without fluorescein molecules 
are also shown. These experiments were performed 
with 488 nm excitation, with 100 uAV of laser power. 
The light was focused through a 40 power 0.75 NA 
objective. 

The fluorescence intensity in all cases started off at a 
high value and then decreased exponentially. The fall- 
off in intensity is due to photobleaching of the fluores- 
cein molecules. The traces of beads without fluorescein 
molecules are used for background subtraction. The 
difference in the initial exponential decay between la- 
beled and nonlabeled beads is integrated to give the 
total number of photon counts, and this number is re- 
lated to the number of molecules per bead. Therefore, it 
is possible to deduce the number of photons per fluores- 
cein molecule that can be detected. For the curves 
illustrated in FIG. 11A and 11B, this calculation indi- 
cates the radiation of about 40 to 50 photons per fluores- 
cein molecule are detected. 

E. Determination of the Number of 
Molecules Per Unit Area 

Aminopropylated glass microscope slides prepared 
according to the methods discussed above were utilized 
in order to establish the density of labeling of the slides. 
The free amino termini of the slides were reacted with 
FITC (fluorescein isothiocyanate) which forms a cova- 
lent linkage with the amino group. The slide is then 
scanned to count the number of fluorescent photons 
generated in a region which, using the estimated 40-50 
photons per fluorescent molecule, enables the calcula- 
tion of the number of molecules which are on the sur- 
face per unit area. 

A slide with aminopropyl silane on its surface was 
immersed inal mM solution of FITC in DMF for 1 
hour at about ambient temperature. After reaction, the 
slide was washed twice with DMF and then washed 
with ethanol, water, and then ethanol again. It was then 
dried and stored in the dark until it was ready to be 
examined. 

Through the use of curves similar to those shown in 
FIG. 11A and 11B, and by integrating the fluorescent 
counts under the exponentially decaying signal, the 
number of free amino groups on the surface after deri- 
vatization was determined. It was determined that slides 
with labeling densities of 1 fluorescein per 10 3 X 10 3 to 
— 2X2 nm could be reproducibly made as the concen- 
tration of aminopropyltriethoxysilane varied from 
10- 5 % to 10-!%. 

F. Removal of NVOC and Attachment of A Fluores- 
cent Marker 

NVOC-GABA groups were attached as described 
above. The entire surface of one slide was exposed to 
light so as to expose a free amino group at the end of the 
gamma amino butyric acid. This slide, and a duplicate 
which was not exposed, were then exposed to fluores- 
cein isothiocyanate (FITC). 



i 

( 



5,445,934 

27 28 

FIG. 12A illustrates the slide which was not exposed Monomer-by-monomer synthesis of YGGFL and 
to light, but which was exposed to FITC. The units of GGFL in alternate squares was performed on a slide in 
the x axis are time and the units of the y axis are counts. a checkerboard pattern and the resulting slide was ex- 
The trace contains a certain amount of background posed to the Herz antibody. This experiment and the 
fluorescence. The duplicate slide was exposed to 350 5 results thereof are illustrated in FIGS. 14A, 14B, ISA, 
nm broadband illumination for about 1 minute (12 and 15B. 

mW/cm 2 , —350 nm illumination), washed and reacted In FIG. 14A, a slide is shown which is derivatized 
with FITC. The fluorescence curves for this slide are with the aminopropyl group, protected in this case with 
shown in FIG. 12B. A large increase in the level of t-BOC (t-butoxycarbonyl). The slide was treated with 
fluorescence is observed, which indicates photolysis has 10 TFA to remove the t-BOC protecting group. E- 
exposed a number of amino groups on the surface of the aminocaproic acid, which was t-BOC protected at its 
slides for attachment of a fluorescent marker. amino group, was then coupled onto the aminopropyl 

G. Use of a Mask in Removal of NVOC groups. The aminocaproic acid serves as a spacer be- 

The next experiment was performed with a 0.1% tween the aminopropyl group and the peptide to be 
aminopropylated slide. Light from a Hg— Xe arc lamp 15 synthesized. The amino end of the spacer was de- 
was imaged onto the substrate through a laser-ablated protected and coupled to NVOC-leucine. The entire 
chrome-on-glass mask in direct contact with the sub- slide was then pinnated with 12 mW of 325 nm broad- 
strate. band illumination. The slide was then coupled with 

This slide was iUuminated for approximately 5 min- NVOC-phenylalanine and washed. The entire slide was 
utes, with 12 mW of 350 nm broadband light and then 20 Ruminated, then coupled to NVOC-glycine and 
reacted with the 1 mM FITC solution. It was put on the washed. The s^e was again illuminated and coupled to 
laser detection scanning stage and a graph was plotted NVOC-gjycme to form the sequence shown in the last 
as a two-dimensional representation of position color- portion of FIG. 14A. 

coded for fluorescence intensity. The fluorescence in- , c v &* shown u m FJG. 14B .alternating regions of the 
tensity fin counts) as a function of location is given on 25 sh ? e ™ t"«n>nf a projection print 

the color scale to the right of FIG. 13A for a mask usmg a 500X f 50 ? ^ m etateboad mask; thus the 
having 100X 100 urn squires. ammo group of glycine was exposed only in the lighted 

The experiment was repeated a number of times ^^^r^ *E*F* ^ ^ 

through various masks. The fluorescence pattern for a » ^TuTa*^' f%* ^F-^ 

SOMmmaskismustratedmFIG.^fora^O^mmask 30 "J* at *°*= wh«4 had.««ved muxmnation. 
in FIG. 13C, and for a 10 mask in FIG. 13D. The ^5?" sM * ^ ° ^iTT^ a ^ZVSr*' 

mask pattern is distinct down to at least about 10 urn S^L*^ 

„„; .v- 1V . , . . . . ^ the lighted areas and m the other areas, GGFL. The 

Trn^^ivr^ C ^ qUe - Herz antibody (which recognizes the YGGFL, but not 

^H. Attachment of YGGFL and Subsequent Exposure 35 GG FL) was then added, followed by goat anti-mouse 



tt a ^ j * ~ * • fluorescein conjugate. 

Herz . Antibody ^and Goat Antimouse ^ resulting fluoresc ence scan is shown in FIG. 

£ t t Z*2T* IP^k 15A, and the color coding for the fluorescence intensity 

polypeptide sequence would bind to a surface-bound is again ^en on the right Dark areas contain the tetra- 
peptide and be ^detected, I^uenkepha^ was coupled to 40 tide GGFL> which h not recognized by the Herz 
die surface and recognized by an antibody. A slide was ^body (and thus there is no binding of the goat anti- 
denvateed ^^^o^opyl-tntthox^c and mouse ^toady ^ fluorescein conjugate), and in the 
protected with NVOC A 500 pirn checkerboard mask red ueas YGGFL is present. The YGGFL pentapep- 
was used to exposethe slide in a flow ceil using backside tide j, recognized by the Herz antibody and, therefore, 
contact printing. The Leu enkephalin sequence (H 2 N- 45 tnere is antibody in the lighted regions for the fluore- 
tyrosme,glycme,glycme,phenylalanme,leucine-C02H f scein-conjugated goat anti-mouse to recognize, 
otherwise referred to herein as YGGFL) was attached similar patterns are shown for a 50 fun mask used in 
via its carboxy end to the exposed amino groups on the direct contact ("proximity print") with the substrate in 
surface of the slide. The peptide was added in DMF FIG. 15B. Note that the pattern is more distinct and the 
solution with the BOP/HOBT/DIEA coupling rea- 50 corners of the checkerboard pattern are touching when 
gents and recirculated through the flow cell for 2 hours the mask is placed in direct contact with the substrate 
at room temperature. (which reflects the increase in resolution using this 

A first antibody, known as the Herz antibody, was technique), 
applied to the surface of the slide for 45 minutes at 2 j. Monomer-by-Monomer Synthesis of YGGFL and 
jig/ml in a supercocktail (containing 1% BSA and \% 55 PGGFL 

ovalbumin also in this case). A second antibody, goat a synthesis using a 50 jim checkerboard mask similar 
anti-mouse fluorescein conjugate, was then added at 2 to that shown in FIG. 15B was conducted. However, P 
jig/ml in the supercocktail buffer, and allowed to incu- was added to the GGFL sites on the substrate through 
bate for 2 hours. An image taken at 10 jim steps indi- an additional coupling step. P was added by exposing 
cated that not only can deprotection be carried out in a 60 protected GGFL to light and subsequent exposure to P 
well defined pattern, but also that (1) the method pro- in the manner set forth above. Therefore, half of the 
vides for successful coupling of peptides to the surface regions on the substrate contained YGGFL and the 
of the substrate, (2) the surface of a bound peptide is remaining half contained PGGFL. 
available for binding with an antibody, and (3) that the The fluorescence plot for this experiment is provided 
detection apparatus capabilities are sufficient to detect 65 in FIG. 16. As shown, the regions are again readily 
binding of a receptor. discernable. This experiment demonstrates that antibod- 

I. Monomer-by-Monomer Formation of YGGFL and ies are able to recognize a specific sequence and that the 
Subsequent Exposure to Labeled Antibody recognition is not length-dependent 
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K. Monomer-by-Monomer Synthesis of YGGFL and TABLE 6-continued 
xPGGFL 



In order to further demonstrate the operability of the Apparent Binding to Herz Ab 

invention, a 50 jim checkerboard pattern of alternating UsuL Sel D " a * a " Set 



YGGFL and YPGGFL was synthesized on a substrate 5 • . waGFL 

using techniques like those set forth above. The result- . 

ing fluorescence plot is provided in FIG. 17. Again, it is VIII. Illustrative Alternative Embodiment 

seen that the antibody is clearly able to recognize the According to an alternative embodiment of the in- 

YGGFL sequence and does not bind significantly at the vention, the methods provide for attaching to the sur- 

YPGGFL regions. 10 face a caged binding member which in its caged form 

L. Synthesis of an Array of Sixteen Different Amino has a relatively low affinity for other potentially bind- 
Acid Sequences and Estimation of Relative Binding ing species, such as receptors and specific binding sub- 
Affinity to Herz Antibody stances. 

Using techniques similar to those set forth above, an According to this alternative embodiment, the inven- 
array of 16 different amino acid sequences (replicated tion provides methods for forming predefined regions 
four times) was synthesized on each of two glass sub- on a surface of a solid support, wherein the predefined 
strates. The sequences were synthesized by attaching regions are capable of immobilizing receptors. The 
the sequence NVOC-GFL across the entire surface of methods make use of caged binding members attached 
the slides. Using a series of masks, two layers of amino 2Q to the surface to enable selective activation of the pre- 
acids were then selectively applied to the substrate. defined regions. The caged binding members are liber- 
Each region had dimensions of 0.25 cm X 0.0625 cm. ated to act as binding members ultimately capable of 
The first slide contained amino acid sequences contain- binding receptors upon selective activation of the pre- 
ing only L amino acids while the second slide contained defined regions. The activated binding members are 
selected D amino acids. FIGS. 18A and 18B illustrate a 2 $ then used to inimobilize specific molecules such as re- 
map of the various regions on the first and second slides, ceptors on the predefined region of the surface. The 
respectively. The patterns shown in FIGS. 18A and above procedure is repeated at the same or different 
18B were duplicated four times on each slide. The slides s * tes on tQ e surface so as to provide a surface prepared 
were then exposed to the Herz antibody and fluore- with a plurality of regions on the surface containing, for 
scein-labeled goat anti-mouse. 30 example, the same or different receptors. When recep- 

FIG. 19 is a fluorescence plot of the first slide, which tors immobilized in this way have a differential affinity 

contained only L amino acids. Red indicates strong for one or more kgands, screenings and assays for the 

binding (149,000 counts or more) while black indicates Hgands can be conducted in the regions of the surface 

little or no binding of the Herz antibody (20,000 counts containing the receptors. 

or less). The bottom right-hand portion of the slide 35 ^ alternative embodiment may make use of novel 

appears "cut off because the slide was broken during ca S ed binding members attached to the substrate, 

processing. The sequence YGGFL is clearly most (^activated) members have a relatively low 

strongly recognized. The sequences YAGFL and affunty for receptors of substances that specificaUy bind 

YSGFL also exhibit strong recognition of the antibody. t0 ^caged bindmg members when compared with the 

By contrast, most of the remaining sequences show little 40 correspontog affinities i of activated bindmg members, 

or no binding. The four duplicate portions of the slide ^ members are protected from reaction 

are extremely consistent in the amount of binding ^sa^B^otc^^zi^tolbc^g^ 

shown therein surface desired to be activated. Upon application 

FIG. 20 is a fluorescence plot of the second slide. ° f a s . uitable energy source^ the caging groups Utilise, 

Again, strongest binding is exhibited by the YGGFL « preSCntmg j^vated binding member. A 

sequence. Significant binding is al* .detected to ^ energy source wdl be light 

v«^ct v.rrr j v rrr / l t j ° nce tne binding members on the surface are acti- 

T"ri£f^ YpGFL (where L-amino adds vated ft ^ a Qr ^ 0f 

are identmed by one upper case letter abbreviation, and chQSen ^ a monoclonal a nucleic ^ 

I>ammo aads are identified by one lower case letter ^ a ^ tor> etc . ^ tor ^ ^ 

abbrev*don). Tb^emanung sequences show less bind- ^ fbaa ^ nQt ^ be ed ^ to ^ 

ing with the antfto^Note the low bindmg efficiency attaching i% Meetly or indirectly, to a binding member. 

° -T- ^ se ^ nCC . . For example, a specific binding substance having a 

Table 6 lists the various sequences tested in order of strong binding for ^ binding member ^ a 

relative fluorescence, which provides information re- 55 strong for ^ receptor or a conjugate of the 

garding relative bmdmg affimty. receptor may be used to act as a bridge between binding 

TABLE 6 members and receptors if desired. The method uses a 

— ^— receptor prepared such that the receptor retains its 
activity toward a particular ligand. 

60 Preferably, the caged binding member attached to the 

solid substrate will be a photoactivatable biotin com- 
plex, i.e., a biotin molecule that has been chemically 
modified with photoactivatable protecting groups so 
that it has a significantly reduced binding affinity for 
65 avidin or avidin analogs than does natural biotin. In a 
preferred embodiment, the protecting groups localized 
in a predefined region of the surface will be removed 
upon application of a suitable source of radiation to give 
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binding members, that are biotin or a functionally analo- 
gous compound having substantially the same binding 
affinity for avidin or avidin analogs as does biotin. 

In another preferred embodiment, avidin or an avidin 
analog is incubated with activated binding members on 
the surface until the avidin binds strongly to the binding 
members. The avidin so immobilized on predefined 
regions of the surface can then be incubated with a 
desired receptor or conjugate of a desired receptor. The 
receptor will preferably be biotinylated, e.g., a bi- 
otinylated antibody, when avidin is immobilized on the 
predefined regions of the surface. Alternatively, a pre- 
ferred embodiment will present an avidin/biotinylated 
receptor complex, which has been previously prepared, 
to activated binding members on the surface. 
IX. Conclusion 

The present inventions provide greatly improved 
methods and apparatus for synthesis of polymers on 
substrates. It is to be understood that the above descrip- 
tion is intended to be illustrative and not restrictive. 
Many embodiments will be apparent to those of skill in 
the art upon reviewing the above description. By way 
of example, the invention has been described primarily 
with reference to the use of photoremovable protective 
groups, but it will be readily recognized by those of skill 
in the art that sources of radiation other than light could 
also be used. For example, in some embodiments it may 
be desirable to use protective groups which are sensi- 
tive to electron beam irradiation, x-ray irradiation, in 
combination with electron beam lithograph, or x-ray 
lithography techniques. Alternatively, the group could 
be removed by exposure to an electric current The 
scope of the invention should, therefore, be determined 
not with reference to the above description, but should 35 
instead be determined with reference to the appended 
claims, along with the full scope of equivalents to which 
such claims are entitled. 
What is claimed is: 

1. A substrate with a surface comprising 10 3 or more 40 
groups of oligonucleotides with different, known se- 
quences covalently attached to the surface in discrete 
known regions, said 10 3 or more groups of oligonucleo- 
tides occupying a total area of less than 1 cm 2 on said 
substrate, said groups of oligonucleotides having differ- 45 
ent nucleotide sequences. 

2. The substrate as recited in claim 1 wherein said 
substrate comprises 10 4 or more different groups of 
oligonucleotide with known sequences covalently cou- 
pled to discrete known regions of said substrate. 50 
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3. The substrate as recited in claim 1 wherein said 
substrate comprises 10 5 or more different groups of 
oligonucleotides with known sequences in discrete 
known regions. 

4. The substrate as recited in claim 1 wherein said 
substrate comprises 10 6 or more different groups of 
oligonucleotides with known sequences in discrete 
known regions. 

5. The substrate as recited in claim 1 wherein said 
groups of oligonucleotides are at least 50% pure within 
said discrete known regions. 

6. The substrate as recited in claim 1 wherein the 
groups of oligonucleotides are attached to the surface 
by a linker. 

7. An array of more than 1,000 different groups of 
oligonucleotide molecules with known sequences cova- 
lently coupled to a surface of a substrate, said groups of 
oligonucleotide molecules each in discrete known re- 
gions and differing from other groups of oligonucleo- 
tide molecules in monomer sequence, each of said dis- 
crete known regions being an area of less than about 
0.01 cm 2 and each discrete known region comprising 
oligonucleotides of known sequence, said different 
groups occupying a total area of less than 1 cm 2 . 

8. The array as recited in claim 7 wherein said area is 
less than 10,000 microns 2 . 

9. The array as recited in claim 7 made by the process 
of: 

exposing a first region of said substrate to light to 
remove photoremovable groups from nucleic acids 
in said first region, and not exposing a second re- 
gion of said surface to light; 

covalently coupling a first nucleotide to said nucleic 
acids on said part of said substrate exposed to light, 
said first nucleotide covalently coupled to said 
photoremovable group; 

exposing a part of said first region of said substrate to 
light, and not exposing another part of said first 
region of said substrate to light to remove said 
photoremovable groups; 

covalently coupling a second nucleotide to said part 
of said first region exposed to light; and 

repeating said steps of exposing said substrate to light 
and covalently coupling nucleotides until said 
more than 500 different groups of nucleotides are 
formed on said surface. 

10. The array as recited in claim 7 comprising more 
than 10,000 groups of oligonucleotides of known se- 
quences. 

.»**** 
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ABSTRACT 



Libraries of unimolecular, doublc-strandcd oligonucleotides 
on a solid support. These libraries are useful in pharmaceu- 
tical discovery for the screening of numerous biological 
samples for specific , interactions between the doublc- 
strandcd oligonucleotides, and peptides, proteins, drugs and 
RNA. lr. a related aspect, the present invention provides 
libraries of conformational^ restricted probes on a solid 
support. The probes arc restricted in their movement and 
flexibility using doublc-strandcd oligonucleotides as scaf- 
folding. The probes arc also useful in various screening 
procedures associated with drug discovery and diagnosis. 
The present invention further provides methods for the 
preparation and screening of the above libraries. 

C Claims, 1 Drawing Sheet 
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SURFACE-BOUND, UNIMOLECULAR, 
DOUBLE-STRANDED DNA 

GOVERNMENT RIGHTS 3 

Research leading to the invention was funded in part by 
NIK Gran; No. R01HG008 13-03 and the government may 
have certain fights to the invention. 

BACKGROUND OF THE INVENTION 10 

The present invention relates to the field of polymer 
synthesis and the use of polymer libraries for biological 
screening. More specifically, in one embodiment the inven- 
tion provides arrays of diverse double-stranded oligonuclc- 15 
otide sequences. In another embodiment, the invention pro- 
vides arrays of conformaa'onally restricted probes, wherein 
the probes are held in position using double-stranded DNA 
sequences as scaffolding. Libraries of diverse unimolecular 
double-stranded nucleic acid sequences and probes may be 20 
used, for example, in screening studies for determination of 
binding affinity exhibited by binding proteins, drugs, or 
RNA 

Methods of synthesizing desired single stranded DNA 
sequences are well known to those of skill in the art In 
particular, methods of\ synthesizing oligonucleotides are 
found in, for example, Oligonucleotide Synthesis: A Prac- 
tical Approach, Gail, ed., IRL Press, Oxford ( 1 984), incor- 
porated herein by reference in its entirety for all purposes. 
Synthesizing unimolecular double-stranded DNA in solution 
has also been described. Sec. Durand. ct al. Nucleic Acids 
Res. 18:6353-6359 (1990) and Thomson, et al. Nucleic 
Acids Res. 21:5600-5603 (1993), the disclosures of both 
being incorporated herein by reference. 3J 

Solid phase synthesis of biological polymers has been 
evolving since the early "Merrificld" solid phase peptide 
synthesis, described in Merrifield, J. Am. Chem. Soc. 
85:2149-2154 (19C3X incorporated herein by reference for 
all purposes. Solid-phase synthesis techniques have been ^ 
provided for the synthesis of several peptide sequences on, 
for example, a number of "pins." See e.g., Geysen et al., J. 
immun. Meih. 102:259-274 (1987), incorporated herein by 
reference for all purposes. Other solid-phase techniques 
involve, for example, synthesis of various peptide sequences <3 
on different cellulose disks supported in a column. See Frank 
and Doring. Tetrahedron 44:6031-6040 (1988), incorpo- 
rated herein by reference for all purposes. Still other solid- 
phase techniques arc described in U.S. Pat. No. 4,728,502 
issued to Hamill and WO 90/00626 (Bcauic, inventor). 50 

Each of the above techniques produces only a relatively 
low density array of polymers. For example, the technique 
described m Geysen ei al. is limited to producing 96 
different polymers on pins spaced in the dimensions of a 
standard microliter plate S3 

Improved methods of forming large arrays of oligonucle- 
otides, peptides and other polymer sequences in a short 
period of time have been devised. Of particular note. Pirrung 
el al., U.S. Pat. No. 5.143,854 (see also PCT Application No. 
WO 90715070) and Fodor et al.. PCT Publication No. WO 60 
92/10092, all incorporated herein by reference, disclose 
methods of forming vast arrays of peptides, oligonucleotides 
and other polymer sequences using, for example, light- 
directed synthesis techniques. See also, Fodor et al.. Science, 
251:767-777 (1991), also incorporated herein by reference 65 
for all purposes. These procedures are now referred to as 
VLSIPS™ procedures. 
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In the above-referenced Fodor et aL. PCT application, an 
elegant method is described for using a computer-controlled 
system to direct a VLSIPS™ procedure. Using this 
approach, one heterogenous array of polymers is converted, 
through simultaneous coupling at a number of reaction sites, 
into a different heterogenous array. See. U.S. Pat. No. 
5.384,261 and U.S. application Ser. No. 07/980,523, the 
disclosures of which are incorporated herein for all pur- 
poses. 

The development of VLSIPS™ technology as described 
in the above-noted U.S. PaL No. 5,143,854 and PCT patent 
publication Nos. WO 90/15070 and 92/10092, is considered 
pioneering technology in the fields of wmbinatonal synthe? 
sis and screening of combinatorial libraries. More recently, 
patent appucaiion Set No. GSAS2£37, Bled Jun. 25, 1993 
now abandoned, describes methods for making arrays of 
oligonucleotide probes that can be used to check or deter- 
mine a partial or complete sequence of a target nucleic acid 
and to detect the presence of a nucleic acid containing a 
specific oligonucleotide sequence. 

A number of biochemical processes of phannaceutical 
interest involve the interaction of some species, e.g., a drug, 
a peptide or protein, or RNA, with double-stranded DNA. 
For example, protcin/DNA binding interactions are involved 
with a number of transcription factors as well as tumor 
suppression associated with the p53 protein and the genes 
contributing to a number of cancer conditions. 

SUMMARY OF THE INVENTION 

High-density arrays of diverse unimolecular, double- 
stranded oligonucleotides, as well as arrays of conforma- 
lionally restricted probes and methods for their use are 
provided by virtue of the present invention. In addition, 
methods and devices for detecting duplex formation of 
oligonucleotides on an array of diverse single-stranded 
oligonucleotides arc also provided by this invention. Fur- 
ther, an adhesive based on the specific binding characteris- 
tics of two arrays of complementary oligonucleotides is 
provided in the present invention. 

According to one aspect of the present invention, libraries 
of unimolecular, double-stranded oligonucleotides arc pro- 
vided. Each member of the library is comprised of a solid 
support, an optional spacer for attaching the doublcstrandcd 
oligonucleotide to the support and for providing sufficient 
space between the double-stranded oligonucleotide and the 
solid support for subsequent binding studies and assays, an 
oligonucleotide attached to the spacer and further attached to 
a second complementary oligonucleotide by means of a 
flexible linker, such that the two oligonucleotide portions 
exist in a doublc-sirandcd configuration.. More particularly, 
the members of the libraries of the present invention can be 
represented by the formula: 

Y-L'-X'-tf-X 1 

in which Y is a solid support, L 1 is a bond or a spacer, L 3 is 
a flexible linking group, and X* and X 2 are a pair of 
complementary oligonucleotides. 

In a specific aspect of the invention, the library of 
different unimolecular. double-stranded oligonucleotides 
can be used for screening a sample for a species which binds 
to one or more members of the library. 

In a related aspect of the invention, a library of different 
conformation ally-restriacd probes attached to a solid sup- 
port is provided. The individual members each have the 
formula: 
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_X«-2-X 11 

in which X" and X" arc complementary oligonucleotides 
and Z is a probe having sufficient length such that X" and 
X 13 form a double-stranded oligonucleotide portion of the 5 
member and thereby restrict the conformation! available to 
the probe. In a specific aspect of the invention, the library of 
different conformaiionalty-restricted probes can be used for 
screening a sample for a species which binds to one or more 
probes in ihe library. l0 

According to yet another aspect of the present invention, 
methods and devices for the bioclectronic detection of 
duplex formation arc provided. 

According to still another aspect of the invention, an 
adhesive is provided which comprises two surfaces of 15 
complementary oligonucleotides. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIGS. I A 10 IF illustrate the preparation of a member of 
a library or surface-bound, uni molecular double-stranded 20 
DNA as well as binding studies with receptors having 
specificity for cither the double stranded DNA portion, a 
probe which is held in a conformational ly restricted form by - 
DNA scaffolding, or a bulge or loop region of RNA. 

DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

Abbreviations 

The following abbxviations arc used herein: phi. phenan- 30 
threncquinone diiminc; phen', 5-amido-gluiaric acid- 1,1 Co- 
ptic namhrolinc; dppz, dipyridophenazinc. 
Glossary 

The following terms arc intended to have the following 
general meanings as they arc used herein: 35 

Chemical terms: As used herein, the term "alkyP refers to 
a saturated hydrocarbon radical which may be straight-chain 
or branched-chain (for example, ethyl, isopropyl, L-amyl. or 
2,5-dimclhylhcxyl). When "alky!** or "alkylcnc" is used lo 
refer to a linking group or a spacer, it is taken to be a group 40 
having two available valences for covalcni attachment, for 
cxampb, -CH 2 CH 2 — . — CH 2 CH 2 CH 2 — . 
— CH 2 CH;CH(CH3)CH 2 — and — CHj(CH,CHj),CH 2 — . 
Preferred alkyl groups as substiuicnts arc those containing 1 
to 10 carbon atoms; with those containing 1 to 6 carbon 45 
atoms being particularly preferred. Preferred alkyl or alky- 
lcnc groups as linking groups arc (hose containing I to 20 
carbon atoms, with those containing 3 to 6 carbon atoms 
being particularly preferred. The term "polyethylene glycol*' 
is used to refer to those molecules which have repeating 50 
units of ethylene glycol, for example, hcxacthylcnc glycol 
(HO— (CH 2 CH 2 0) 3 — CH 2 CH 2 OH). When the term "poly- 
ethylene glycol" is used to refer to linking groups and spacer 
groups, it would be understood by one of skill In the an that 
other polycthcrs or polyols could be used as well (i. c, 33 
polypropylene glycol or mixtures of ethylene and propylene 
glycols). 

The term "protecting group" as used herein, refers to any 
of the groups which arc designed to block one reactive site 
in a molecule while a chemical reaction is carried out at 60 
another reactive site. More particularly, the protecting 
groups used herein can be any of those groups described in 
Greene, a ah. Protective Croups In Organic Chemistry, 2nd 
Ed.. John Wiley & Sons, New York, N.Y, 1991 , incorporated 
herein by reference. The proper selection of protecting 63 
groups for a particular synthesis will be governed by the 
overall methods employed in the synthesis. For example, in 
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M lighi-dircctcd M synthesis, discussed below, the protecting 
groups will be photolabilc protecting groups such as NVOC, 
MeNPOC, and those disclosed in co-pending Application 
PCT/US93/10162 (filed Oct. 22, 1993), incorporated herein 
by reference. In other methods, protecting groups may be 
removed by chemical methods and include groups such as 
FMOC DMT and others known to those of skill in the art. 

Complementary or substantially complementary: Refers 
to the hybridization or base pairing between nucleotides or 
nucleic acids, such as. for instance, between the two strands 
of a double stranded DNA molecule or between an oligo- 
nucleotide primer and a primer binding site on a single 
stranded nucleic acid to be sequenced or amplified Comple- 
mentary nucleotides arc, generally, A and T (or A and U). or 
C and G. Two single stranded RNA or DNA molecules are 
said to be substantially complementary when the nucleotides 
of one strand, optimally aligned and compared and with 
appropriate nucleotide insertions or deletions, pair with at 
least about 80% of the nucleotides of the other strand, 
usually at least about 90% to 93%, and more preferably from 
about 98 to 100%. 

Alternatively, substantial complementary exists when an 
RNA or DNA strand will hybridize under selective hybrid- 
ization conditions to its complement. Typically, selective 
hybridization will occur when there is at least about 65% 
complementary over a stretch of at least 14 to 25 nucle- 
otides, preferably at least about 75%, more preferably at 
least about 90% complementary. S. ce. M. Kanchisa Nucleic 
Acids Res. 12:203 (1984), incorporated herein by reference. 

Stringent hybridization conditions will typically include 
salt concentrations of less than about IM, more usually less 
than about 500 mM and preferably less than about 200 mM. 
Hybridization temperatures can be as low as 5° C, but arc 
typically greater than 22° C, more typically greater than 
about 30° C. and preferably in excess of about 37° C 
Longer fragments may require higher hybridization tem- 
peratures for specific hybridization. As other factors may 
afTcc: the stringency, of hybridization, including base com- 
position and length of the complementary strands, presence 
of organic solvents and extern of base mismatching, the 
combination of parameters is more important than the abso- 
lute measure of any one alone. 

Epitope: The portion of an antigen molecule which is 
delineated by the area of interaction with the subclass of 
receptors known as antibodies. 

Identifier tag: A means whereby one can identify which 
molecules have experienced e particular reaction in the 
synthesis of an oligomer. The identifier lag also records the 
step in the synthesis scries in which the molecules experi- 
enced that particular monomer reaction. The identifier tag 
may be any recognizable feature which is. for example: 
microscopically distinguishable in shape, size, color, optical 
density, etc.; differently absorbing or emitting of light: 
chemically reactive; magnetically or electronically encoded; 
or in some other way distinctively marked with the required 
information. A preferred example of such an identifier tag is 
an oligonucleotide sequence. 

UganoVProbc: A ligand is a molecule that is recognized by 
a particular receptor. The agent bound by or reacting with a 
receptor is called a "ligand," a term which is definitionally 
meaningful only in terms of its counterpart receptor. The 
term 'ligand** docs not imply any particular molecular size 
or other structural or compositional feature other than that 
the substance in question is capable of binding or otherwise 
interacting with the receptor. Also, a ligand may serve cither 
as the natural ligand to which the receptor binds, or as a 
functional analogue that may act as an agonist or antagonist. 
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Examples, of ligands that can be investigated by this inven- 
tion include, but arc not restricted to, agonists and antago- 
nists for cell membrane receptors, toxins and venoms, viral 
epitopes, hormones (e.g., opiates, steroids, etc.), hormone 
receptors, peptides, enzymes, enzyme substrates, substrate $ 
analogs, transition slate analogs, cofactors, drugs, proteins, 
and antibodies. The term "probe" refers to those molecules 
which are expected to act like ligands but for which binding 
informaiion is typically unknown. For example, if a receptor 
is known to bind a ligand which is a peptide p-turo, a 10 
"probe" or library of probes will be those molecules 
designed to mimic the peptide p-tum. In instances where the 
particular ligand associated with a given receptor is 
unknown, the term probe refers to those molecules designed 
as potential ligands for the receptor. is 

Monomer: Any member of the set of molecules which can 
be joined together to form an oligomer or polymer. The set 
of monomers useful in the present invention includes, but is 
not restricted to, for the example of oligonucleotide synthe- 
sis, the set of nucleotides consisting of adenine, thymine, 20 
cytosine, guanine, and uridine (A, T, C G, and U, respec- 
tively) and synthetic analogs thereof. As used herein, mono- 
mers refers to any member of a basis set for synthesis of an 
oligomer. Different basis sets of monomers may be used ai 
successive steps in the synthesis of a polymer. 25 

Oligomer or Polymer: The oligomer or polymer 
sequences of the present invention are formed from the 
chemical or enzymatic addition of monomer subunits. Such 
oligomers include, for example, both linear, cyclic, and 
branched polymers of nucleic acids, polysaccharides, phos- 30 
pholipids, and peptides having cither a-* p-, or w-amino 
acids, hcicropolymcrs in which a known drug is covalcnily 
bound to any of the above, polyurethanes, polyesters, poly, 
carbonates, polyureas, polyamides, polyelhylrnci mines, 
polyarylcne sulfides, polysiloxanes, polyimides, polyac- 35 
ctaics, or other polymers which will be readily apparent to 
one skilled in the art upon review of this disclosure. As used 
herein, the term oligomer or polymer is meant to include 
such molecules as (J-tum mimciics, prostaglandins and ben- 
zodiazepines which can also be synthesized in a stepwise 40 
fashion on a solid support. 

Peptide: A peptide is an oligomer in which the monomers 
arc amino acids and which arc joined together through 
amide bonds and alternatively referred to as a polypeptide. 
In the context of this specification it should be appreciated 45 
that when a-amino acids are used, they may be the L-opnca! 
isomer or the D-optical isomer. Other amino acids which arc 
useful in the present invention include unnatural amino acids 
such a (V alanine, phcnylglycinc, homoarginine and the like. 
Peptides arc more than two amino acid monomers long, and 50 
often more than 20 amino acid monomers long. Standard 
abbreviations for amino acids arc used (e.g., P for proline). 
These abbreviations are included in Stryer. Biochemistry, 
Third Ed, (1988), which is incorporated herein by reference 
for all purposes. 35 

Oligonucleotides: An oligonucleotide is a single-stranded 
DNA or RNA molecule, typically prepared by synthetic 
means. Alternatively, naturally occurring oligonucleotides, 
or fragments thereof, may be isolated from their natural 
sources or purchased from commercial sources. Those oii- 60 
gonucleou'des employed in the present invention will be 4 to 
100 nucleotides in length, preferably from 6 to 30 nucle- 
otides, although oligonucleotides of different length may be 
appropriate. Suitable oligonucleotides may be prepared by 
the phosphoramiditc method described by Bcaucage and 65 
Carruthcn. Tetrahedron Lett.. 22:1859-1862 (1981), or by 
the triesicr method according to Matteucci, et al., / Am. 



Chem. Soc. 103:3185 (1981), both incorporated herein by 
reference, or by other chemical methods . using either a 
commercial automated oligonucleotide synthesizer or 
VLSEPS™ technology (discussed in detail below). When 
oligonucleotides are referred to as "double-stranded," it is 
understood by those of skill in the an that a pair of 
oligonucleotides exist in a hydrogen-bonded, helical array 
typically associated with, for example, DNA. In addition to 
the 100% complementary form of double-stranded oligo- 
nucleotides, the term "double-stranded" as used herein is 
also meant to refer to those forms which include such 
structural features as bulges and loops, described more fully 
in such biochemistry texts as Stryer. Biochemistry, Third 
Ed., (1988), previously incorporated herein by reference for 
all purposes. 

Receptor A molecule thai has an affinity for a given 
ligand or probe. Receptors may be n^rurally-occurring or 
man made molecules. Also, they can be employed in their 
unaltered natural or isolated state or as aggregates with other 
species. Receptors may be attached, covalently or nonco- 
valently, to a binding member, either directly or via a 
specific binding substance. Examples of receptors which can 
be employed by this invention include, but are not restricted 
to, antibodies, cell membrane receptors, monoclonal anti- 
bodies and an ti sera reactive with specific antigenic deter- 
minants (such as on viruses, cells or other materials), drugs, 
polynucleotides, nucleic acids, peptides, cofactors, lectins, 
sugars, polysaccharides, cells, cellular membranes, and 
organelles. Receptors are sometimes referred to in the art as 
ami -ligands. As the term receptors is used herein, no differ- 
ence in meaning is intended. A "ligand-receptor pair" is 
formed when two molecules have combined through 
molecular recognition to form a complex. Other examples of 
receptors which can be investigated by this invention 
include but arc not restricted to: 

a) Microorganism receptors: Determination of ligands or. 
probes that bind to receptors, such as specific transport 
proteins or enzymes essential to survival of microor- 
ganisms, is useful in a new class of antibiotics. Of 
particular value would be antibiotics against opporm- 
nistic fungi, protozoa, and those bacteria resistant to the 
antibiotics in current use. 

b) Enzymes: For instance, the binding site of enzymes 
such as the enzymes responsible for cleaving neu- 
rotransmitters. Determination of ligands or probes thai . 
bind to certain receptors, and thus modulate the action 
of the enzymes that cleave the different neurotransmit- 
ters, is useful in the development of drugs thai can be 
used in the treatment of disorders of neurou-ansmission. 

c) Antibodies: For instance, the invention may be useful 
in investigating the ligand -binding site on the antibody 
molecule which combines with the epitope of an anti- 
gen of interest. Determining a sequence that mimics an 
antigenic epitope may lead to the development of 
vaccines of which the immunogen is based on one or 
more of such sequences, or lead to the development of 
relaxed diagnostic agents or compounds useful in thera- 
peutic treatments such as for autoimmune diseases 
(e.g., by blocking the binding of the "scir antibodies). 

d) Nucleic Acids: The invention may be useful in inves- 
tigating aequences of nucleic acids acting as binding 
sites for cellular proteins ("nans-acting factors")- Such 
sequences may include, e.g., transcription factors, sup- 
pressors, enhancers or promoter sequences. 

c) Catalytic Polypeptides: Polymers, preferably polypep- 
tides, which are capable of promoting a chemical 
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reaction involving ihc conversion of . one or more 
react ants 10 one or more products. Such polypeptides 
generally include a binding site specific for at least one 
reactam or reaction intermediate and an active func- 
tionality proximate to the binding site, which function- 5 
ality is capable of chemically modifying the bound 
rcactant. Catalytic polypeptides are described in, 
Lcmcr. R.A. ct al„ Science 252: 659 (1991), which is 
incorporated herein by reference. 
Q Hormone receptors: For instance, the receptors for 10 
insulin and growth hormone. Determination of the 
ligands which bind with high affinity to a receptor is 
useful in the development of, for example, an oral 
replacement of the daily injections which diabetics 
must take to relieve the symptoms of diabetes, and in 15 
the other case, a replacement for the scarce human 
growth hormone thai can only be obtained from cadav- 
ers or by recombinant DNA technology. Other 
examples arc the vasoconstrictive hormone receptors; 
determination of those ligands that bind to a receptor 20 
may lead to the development of drugs to control blood 
pressure. 

g) Opiate rcccpiors: Determination of ligands that bind to 
the opiate receptors in the brain is useful in the devel- 
opment of less-addictive replacements for morphine 
and related drugs. 

Substrate or Solid Support: A material having a rigid or 
semi-rigid surface. Such materials will preferably take the 
form of plates or slides, small beads, pellets, disks or other 
convenient forms, although other forms may be used. In 
some embodiments, at least one surface of the substrate will 
be substantially flat In other embodiments, a roughly spheri- 
cal shape is preferred. 

Synthetic: Produced by in vitro chemical or enzymatic 
synthesis. The synthetic libraries of the present invention 
may be contrasted with those in viral or pi as mid vectors, for 
instance, which may be propagated in bacterial, yeast, or 
other living hosts. 

40 

DESCRIPTION OF THE INVENTION 

The broad concept of the present invention is illustrated in 
FIGS. 1A to IF FIGS. 1A, IB and 1C illustrate the prepa- 
ration or surface-bound unimolccular double stranded DNA, 4; 
while FIGS. ID, IE, and IF illustrate uses for the libraries 
of the present invention. 

FIG. 1 A shows a solid support 1 having an attached spacer 
2, which is optional. Attached to the distal end of the spacer 
is a first oligomer 3, which can be attached as a single uni*. 50 
or synthesized on the support or spacer in a monomer by 
monomer approach. FIG. IB shows a subsequent stage in 
the preparation of one member of a library according to the 
present invention. In this stage, a flexible linker 4 is attached 
to the distal end or the oligomer 3. In other embodiments, the 55 
flexible linker will be a probe. FIG. 1C shows the completed 
surface-bound unimolccular double stranded DNA which is 
one member of a library, wherein a second oligomer 5 is now 
attached to the distal end of the flexible tinker (or probe). As 
shown in FIG. 1C, the length of the flexible linker (or probe) 60 
4 is sufficient such that the first and second oligomers (which 
arc complementary) exist in a double-stranded conforma- 
tion. It will be appreciated by one of skill in the art, that the 
libraries of the present invention will contain multiple, 
individually synthesized members which can be screened for 65 
various types of activity. Three such binding events are 
illustrated in FIGS. 1 D. IE and IF. 
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In FIG. ID, a receptor 6. which can be a protein, RNA 
molecule or other molecule which is known to bind to DNA, 
is mtroduced to the library. Dcterrnining which member of 
a library binds to the receptor provides information which is 
useful for diagnosing diseases, sequencing DNA or RNA, 
identifying geneuc characteristics, or in drug discovery. 

In FIG. IE. the linker 4 is a probe for which binding 
information is sought. The probe is held in a conformation- 
ally restricted manner by the flanking oligomers 3 and 5, 
which arc present in a double- stranded conformation. As a 
result, a library of conformaiionally restricted probes can be 
screened for binding activity with a receptor 7 which has 
specificity for the probe. 

The present invention also contemplates the preparation 
of libraries of uni molecular, double-stranded oligonucle- 
otides having bulges or loops in one of the strands as 
depicted in FIG. IF. In FIG. IF, one oligonucleotide S is 
shown as having a bulge 8. Specific RNA bulges arc often 
recognized by proteins (e.g., TAR RNA is recognized by the 
TAT protein of HTV). Aixordingly, libraries of RNA bulges 
or loops are useful in a number of diagnostic applications. 
One of skill in the an will appreciate that the bulge or loop 
can be present in either oligonucleotide portion 3 or 5. 
Libraries of Unimolccular, Double-Stranded Oligonucle- 
otides 

In one aspect, the present invention provides libraries of 
unimolccular double -stranded oligonucleotides, each mem- 
ber of the library having the formula: 

in which Y represents a solid support, X' and X 3 represent 
a pair of complementary oligonucleotides, V represents a 
bond or a spacer, and L represents a linking group having 
sufficient length such that X 1 and X 2 form a double stranded 
oligonucleotide. 

The solid support may be biological, nonbiological, 
organic, inorganic, or a combination of any of these, existing 
as panicles, strands, precipitates, gels, sheets, tubing, 
spheres, containers, capillaries, pads, slices, films, plates, 
slides, etc. The solid support is preferably flat but may lake 
on alternative surface configurations. For example, the solid 
support may contain raised or depressed regions on which 
synthesis takes place. In some embodiments, the solid 
support will be chosen to provide appropriate light-absorb- 
ing characteristics. For example, the support may be a 
polymerized Langmuir Blodgctt film, functionalizcd glass. 
Si. Gc. GaAs, GaP. Si0 2 , SiN 4 . modified silicon, or any one 
of a variety of gels or polymers such as (poly)ictrafluoro- 
cthylcnc, (poly)vinylidcndi fluoride, polystyrene, polycar- 
bonate, or combinations thereof. Other suitable solid support 
materials will be readily apparent to those of skill in the an. 
Preferably, the surface of the solid support will contain 
reactive groups, which could be carboxyl, amino, hydroxyl, 
thiol, or the like. More preferably, the surface will be 
optically transparent and wilt have surface Si — OH func- 
tionalities, such as arc found on silica surfaces. 

Attached to the solid support is an optional spacer, L 1 . The 
spacer molecules axe preferably of sufficient length to permit 
the double-stranded oligonucleotides in the completed mem- 
ber of the library to interact freely with molecules exposed 
to the library. The spacer molecules, when present, arc 
typically6-50 atoms long to provide sufficient exposure for 
the attached double-stranded DNA molecule. The spacer, L 1 , 
is comprised of a surface attaching portion and a longer 
chain portion. The surface attaching portion is that pan of L 1 
which is directly attached to the solid support. This portion 
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can be attached to the solid support via carbon-carbon bond* 
using, for example, supports having (poly)trifluorochloro- 
ethylene surfaces, or preferably, by siloxane bonds (using, 
for example, glass or silicon oxide as the solid support). 
Sfloxane bonds with the surface of the support are formed in 
one embodiment via reactions of surface attaching portions 
bearing trichlorosilyl or trial koxysflyl groups. The surface 
attaching groups will also have a site for attachment of the 
longer chain portion. For example, groups which are suitable 
for attachment to a longer chain portion would include : 
amines, hydroxy!, thiol, and carboxyl. Preferred surface 
attaching portions include aminoalkylsilanes and bydroxy- 
alkylsilanes. In particularly preferred embodiments, the sur- 
face attaching portion of L l is either bis(2-bydroxyethyl)- 
aminopropyltriethoxysilace, 
2-hydroxyewylarninopropyltrieuV>xysilane,a^ 
ethoxysilane or bydroxypropyltrietboxysilant 

The longer chain portion can be any of a variety of 
molecules which are inert lo the subsequent conditions for 
polymer synthesis. These longer chain portions will typi- 
cally be aryl acetylene, ethylene glycol oligomers containing 
2-14 monomer units, diamines, diacids, amino acids, pep- 
tides, o: combinations thereof. In some embodiments, the 
longer chain portion is a polynucleotide. The longer chain 
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portion which is to be used as part of L 1 can be selected 25 herein by reference 



of the compounds of the invenuon, the linking group will be 
provided with functional groups which can be suitably 
protected or activated. The linking group will be covalcutlv 
attached to each of the complementary oligonucleotides, X 
and X 5 . by means of an ether, ester, carbamate, phosphate 
ester or amine linkage. The flexible linking group L will be 
attached to the y-hydroxyl of the terminal monomer of X 
and to the 3'-hydroxyl of the initial monomer. of X* Pre- 
ferred linkages are phosphate ester linkages which can be 
formed in the same manner as the oligonucleotide linkages 
which are present in X* and X 1 . For example, hcxaethyl- 
cneglycol can be protected on one terminus with a photo- 
labile protecting group (i-t, NVOC or MeNPOQ and 
activated on the other terminus with 2-cyanoethyl-N^J* 
misopropylamino-chlorophosphile to form a phosphoramid* 
ite. This linking group can then be used for consiruciion of 
the libraries in the same manner as the photolabile-protected, 
phosphoranudite-activated nucleotides. Alternatively, ester 
linkages to X 1 and X J can be formed when the L has 
terminal carboxylic acid moieties (using the S'-bydroxyl of 
X 1 and the 3-hydroxyl of X 2 ). Other methods of forming 
ether, carbamate or amine linkages are known to those of 
skill in the art and particular reagents and references can be 
found in such texts as March, Advanced Organic Chemistry , 
4th Ed, Wjley-intcrscience. New York. N.Y. 1992, incor- 



bascd upon its hydrophilic/nydrophobic properties to 
improve presentation of the double-stranded oligonucle- 
otides to certain receptors, proteins or drugs. The longer 
chain portion of L 1 can be constructed of polyethylenegly- 
cols. polynucleotides, alkylenc. polyalcoho!, polyester, 
polyaminc. polyphosphodiester and combinations thereof. 
Additionally, for use in synthesis of the libraries of the 
invenuon, L 1 will typically have a protecting group, attached 
to a functional group (i.e., hydroxyl, amino or carboxylic 
acid) on the distal or terminal end of the chain portion 
(opposite the solid support). After deprotection and cou- 
pling, the distal end is covalently bound to an oligomer. 

Attached to the distal end of L' is an oligonucleotide, X 1 , 
which is a singlc-strandcd DNA or RNA molecule. The 



The oligonucleotide, X 3 . which is covalently attached to 
the distal end of the linking group is* like X 1 , a single- 
stranded DNA or RNA molecule. The oligonucleotides 
which are part of the present invention arc typically of from 
about 4 to about 1 00 nucleotides in length. Preferably, X 2 is 
an oligonucleotide which is about 6 to about 30 nucleotides 
in length and exhibits complementary to X 1 of from 90 to 
100%. More preferably, X 1 and X 2 arc 100% complemen- 
tary. In one group of embodiments, either X' or X will 
further comprise a bulge or loop portion and exhibit comple- 
mentary of from 90 to 100% over the remainder of the 
oligonucleotide. 

In a particularly preferred embodiment, the solid support 
is a silica support, the spacer is a polyethyleneglycol con- 



oligonuclcoudes which are part of the present invention are *o j U g 8lC( j w ^ aminoalkylsilanc, the linking group is a 
typically of from about 4 to about 100 nucleotides in length. polyethyleneglycol group, and X' and X 2 are complemen- 
Prefcrably. X 1 is an oligonucleotide which is about 6 to ^ oligonucleotides each comprising of from 6 to 30 
about 30 nucleotides in length. The oligonucleotide is typi- nucleic acid monomers. 

cally linked to L 1 via the 3'-hydroxyl group of the oligo- -p^ ii orary can have virtually any number of different 
nucleotide and a functional group on V which results in the «5 mcm b efS( and will be limited only by the number or variety 
formation of an ether, ester, carbamate or phosphate ester of compounds desired to be screened in a given application 
linkage. and by the synthetic capabilities of the practitioner. In one 

Attached to the distal end of X 1 is a linking group, L . group 0 r embodiments, the library will have from 2 up to 
which is flexible and of sufficient length that X 1 can cfTec- jqq members. In other groups of embodiments, the library 

lively hybridize with X 3 . The length or the linker will 50 — 1 J u 

typically be a length which is at least the length spanned by 
two nucleotide monomers, and preferably at least four 
nucleotide monomers, while not be so long as to interfere 
with cither the pairing or X 1 and X 2 or any subsequent 
assays. Tne linking group itself will typically be an alkylene 55 
group (of from about 6 to about 24 carbons in length), a 
polyethyleneglycol group (of from about 2 to about 24 
cthyleneglycol monomers in a linear configuration), a poly- 
alcohol group, a polyaminc group (e.g., spermine, spermi- 
dine and polymeric derivatives thereof), a polyester group 60 
(e.g., polyvinyl acrylate) having of from 3 to 15 ethyl 
acrylate monomers in a linear configuration), a polyphos- 
phodiester group, or a polynucleotide (having from about 2 
to about 12 nucleic adds). Preferably, the linking group will 
be a Dolyethylcncglycol group which is at least a tetraeth- 65 in which X" and X" arc complementary ohgonucleoudes 
Ml7 and «A. from about 1 to 4 hexa- and Z is a probe The probe will have sufficient length such 
£^^bl£Sri UnJarray. For use in synthesis that X" and X« form a double-stranded DNA poruon of 



will have between 100 and 10000 members, and between 
10000 and 1000000 members, preferably on a solid support. 
In preferred embodiments, the library will have a density of 
more than 100 members at known locations per cm 1 , pref- 
erably more than 1000 per cm 1 , more preferably more than 
10.000 per cm 2 . 

Libraries or Conformational ly Restricted Probes 

In stili another aspect, the present invenuon provides 
libraries of conformaiionally-restricced probes. Each of the 
members of the library comprises a solid support having an 
optional spacer which Is attached to an oligomer of the 
formula; 

_X n -2-X 1> 
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each member. X 11 and X 11 arc as described above for X 1 and 
X 2 respectively, except thai for the present aspect of the 
invention, each member of the probe library can have the 
same X 1 ' and the same X ,a , and differ only in the probe 
portion. In one group of embodiments, X" and X 12 are 3 
cither a poly-A oligonucleotide or a poly-T oligonucleotide. 

As noted above, each member of the library will lypicall y 
have a different probe portion. The probes, Z, can be any of 
a variety of structures for which receptor-probe binding 
information is sought for ccnforrnationally-restricted forms. 10 
For example, the probe can be an agonist or antagonist for 
a cell membrane receptor, a toxin, venom, vital epitope 
hormone, peptide, enzyme collector, drug, protein or anti- 
body. In one group of embodiments, the probes are different 
peptides, each having of from about 4 to about 12 amino is 
acids. Preferably the probes will be linked via polyphos- 
phate dicsten, although other linkages arc also suitable. For 
example, the last monomer employed on the X 11 chain can 
be a 5-aminopropyl-functionalizcd phosphocamidite nucle- 
otide (available from Glen Research, Sterling, Va., USA or 20 
Gcnosys Biotechnologies, The Woodlands, lex., USA) 
which will provide a synthesis initiation site for the carboxy 
to amino synthesis of the peptide probe. Once the peptide 
probe is formed, a 3-succinylated nucleoside (from Cru- 
achem. Sterling, Va„ USA) wOl be added under peptide 25 
coupling conditions. In yet another group of embodiments, 
the probes will be oligonucleotides of from 4 to about 30 
nucleic acid monomers which will form a DNA or RNA 
hairpin structure. For use in synthesis, the probes can also 
have associated functional groups (i.e., hydroxy 1, amino, 30 
carboxylic acid, anhydride and derivatives thereof) for 
at i aching two positions oh the probe to each of the comple- 
mentary oligonucleotides. 

The surface of the solid support is preferably provided 
wiih a spacer molecule, although it will be understood that 35 
the spacer molecules arc not elements of this aspect of the 
invention. Where present, the spacer molecules will be as 
described above for L 1 . 

The libraries of conformational ty restricted probes can 
also have virtually any number of members. As above, the « 
number of members wilt be limited only by design of the 
particular screening assay for which the library will be used, 
and by the synthetic capabilities of the practitioner. In one 
group or embodiments. :hc library will have from 2 to 100 
members. In other groups of embodiments, the library will *S 
have between 100 and 10000 members, and between 10000 
and 1000000 members. Also as above, in preferred embodi- 
ments, the library will have a density of more than 100 
members at known locations per cm 2 , preferably more than 
1000 per cm 2 , more preferably more than 10,000 per cm 2 . 50 
Preparation of the Libraries 

The present invention further provides methods for the 
preparation of diverse unimolecular, double-stranded oligo- 
nucleotides on a solid support. In one group of embodi- 
ments, the surface of a solid support has a plurality of 55 
preselected regions. An oligonucleotide of from 6 to 30 
monomers is formed on each of the preselected regions. A 
linking group is then attached to the distal end of each of the 
oligonucleotides. Finally, a second oligonucleotide is 
formed on the distal end of each linking group such that the 60 
second oligonucleotide is complementary to the oligonucle- 
otide already present in the same preselected region. The 
linking group used will have sufficient length such thai the 
complementary oligonucleotides form a unimolecular, 
double- stranded oligonucleotide. In another group of 65 
embodiments, each chemically distinct member of the 
library' will be synthesized on a separate solid support. 



Libraries on a Single Substrate 
Light-Directed Methods 

For those embodiments using a single solid support, the 
oligonucleotides of the present invention cart be formed 
using a variety of techniques known to those skilled in the 
an of polymer synthesis on solid supports. For example, 
"right directed" methods (which are one technique in a 
family of methods known as VLSIPS™ methods) arc 
described in U.S. Pat No. 5,143.854, previously incorpo- 
rated by reference. The light directed methods discussed in 
the '854 patent involve activating predefined regions of a 
substrate or solid support and then contacting the substrate 
with a preselected monomer solution. The predefined 
regions can be activated with a light source, typically shown 
through a mask (much in the manner of photolithography 
techniques used in integrated circuit fabrication). Other 
regions of the substrate remain inactive because they arc 
blocked by the mask from illumination and remain chemi- 
cally protected Thus, a light pattern defines which regions 
of the substrate react with a given monomer. By repeatedly 
activating different sets of predefined regions and contacting 
different monomer solutions with the substrate, a diverse 
array of polymers is produced on the substrate. Of course, 
other steps such as washing unrcactcd monomer solution 
from the substrate car. be used as necessary. Other tech- 
niques include mechanical techniques such as those 
described in PCT No. 92/10183, U.S. Pat. No. 5,384,261 
also incorporated herein by reference for all purposes. Still 
further techniques include bead based techniques such as 
those described in PCT US/93/04145, also incorporated 
herein by reference, and pin based methods such as those 
described in U.S. Pat. No. 5,288.514. also incorporated 
herein by reference. 

The VLSIPS™ methods arc preferred for making the 
compounds and libraries of the present invention. The 
surface of a solid support, optionally modified with spacers 
having photolabilc protecting groups such as NVOC and 
McNPOC, is illuminated through a photolithographic mask, 
yielding reactive groups (typically hydroxyl groups) in the 
illuminated regions. A 3'-0-phosphoramiditc activated 
dcoxynuclcosidc (protected at the S'-hydroxyl with a pho- 
tolabilc protecting group) is then presented to the surface 
and chemical coupling occurs at sites thai were exposed to 
light. Following capping, and oxidation, the substrate is 
rinsed and the surface illuminated through a second mask, to 
expose additional hydroxy 1 groups for coupling. A second 
5-protcctcd, 3 -O-phosphoramiditc activated dcoxynuclco- 
sidc is presented to the surface. The selective photodepro- 
icction and coupling cycles arc repeated until the desired set 
of oligonucleotides is produced. Alternatively, an oligomer 
of from, for example. 4 to 30 nucleotides can be added to 
each of the preselected regions rather than synthesize each 
member in a monomer by monomer approach. At this point 
in the synthesis, either a flexible linking group or a probe can 
be attached in a similar manner. For example, a flexible 
linking group such as polyethylene glycol will typically 
having an activating group (i c, a phosphoramiditc) on one 
end and a photolabilc protecting group attached to the other 
end. Suitably derivatized polyethylene glycol linking groups 
can be prepared by the methods described in Durand, ct al. 
Nucleic Acids Res. 18:6353-6359 (1990). Briefly, a poly- 
ethylene glycol (i.e., hcxacthylenc glycol) can be mono- 
protected using MeNPOC-chloridc. Following purification 
of the mono-protected glycol, the remaining hydroxy moiety 
can be activated with 2-cyanocthy1-N,N-ditsopropylamj- 
nochlorophosphitc. Once the flexible linking group hat been 
attached to the first oligonucleotide (X 1 ). deprotection and 
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. coupling cycles will proceed using 5 -protected, 3'-0-pfott* 
phoramidite activated deoxynucleostdes or intact oligomers. 
Probes can be attached in a manner similar to that used for 
the flexible linking group. When the desired probe is itself 
an oligomer, it can be formed either in stepwise fashion on s 
the immobilized oligonucleotide or it can be separately 
synthesized and coupled to the immobilized oligomer in a 
single step. For example, preparation of confonnttionally 
restricted p-tum mimeties will typically involve synthesis of 
an oligonucleotide as described above, in which the last 10 
nucleoside monomer will be derivatized with an aminoalkyl- 
funciionalized phosphoramidite. See, U.S. Pat No. 5,288, 
514, previously incorporated by reference. The desired 
peptide probe is typically formed in the direction from 
carboxyl to amine terminus. Subsequent coupling of a 15 
3'-succinylaied nucleoside, for example, provides the first 
monomer in the construction of the complementary oligo- 
nucleotide strand (which is carried out by the above meth- 
ods). Alternatively, a library of probes can be prepared by 
first derivatizing a solid support with multiple poly(A) or 20 
polyfD oligonucleotides which are suitably protected with 
photolabile protecting groups, deprotecting at known sites 
and constructing the probe at those sites, then coupling the 
complementary poly(T) or poly(A) oligonucleotide. 

Row Channel or Spotting Methods 23 

Additional methods applicable to library synthesis on a 
single substrate are described in co-pending applications 
Ser. No. 07/980,523. filed Nov. 20, 1992, and U.S. PaL No. 
5 .384.26 1, incorporated herein by reference for all purposes. 
In the methods disclosed in these applications, reagents are 30 
delivered to the substrate by either (I) flowing within a 
channel defined on predefined regions or (2) "spoiling" on 
predefined regions. However, other approaches, as well as 
combinations of spotting and flowing, may be employed. In 
each instance, certain activated regions of the substrate are 35 
mechanically separated from other regions when the mono- 
mer solutions are delivered to the various reaction sites. 

A typical "flow channel" method applied to the com- 
pounds and libraries of the present invention can generally 
be described as follows. Diverse polymer sequences are 40 
synthesized at selected regions of a substrate or solid support 
by forming flow channels on a surface of the substrate 
through which appropriate reagents flow or in which appro- 
priate reagents arc placed. For example, assume a monomer 
"A" is to be bound to the substrate in a first group of selected 45 
regions. If necessary, all or part or the surface of the 
substrate in all or a part of the selected regions is activated 
for binding by. for example, flowing appropriate reagents 
through all or some of the channels, or by washing the entire 
substrate with appropriate reagents. After placement of a 50 
channel block on the surface of the substrate, a reagent 
having the monomer A flows through or is placed in all or 
some of the channel(s). The channels provide fluid contact 
to the first selected regions, thereby binding the monomer A 
on the substrate directly or indirectly (via a spacer) in the 55 
first selected regions. 

Thereafter, a monomer B is coupled to second selected 
regions, some of which may be included among the first 
selected regions. The second selected regions will be in fluid 
contact with a second flow charmel(s) through translation, 60 
rotation, or replacement of the chancel block on the surface 
of the substrate; through opening or closing a selected valve; 
or through deposition of a layer of chemical or photoresist 
if necessary, a step is performed for activating at least the 
second regions. Thereafter, the monomer B is Bowed 63 
through or placed in the second flow channel(s), binding 
monomer B at the second selected locations. In this particu- 



lar example the resulting sequences bound to the substrate 
at this stage of processing will be, for example, A, B, and 
AB. The process is repeated to form a vast array of 
sequences of desired length at known locations on the 
substrate. 

After the substrate is activated, monomer A can be flowed 
through some of the channels, monomer B can be flowed 
through other channels, a monomer C can be flowed through 
still other channels, etc. In this manner, many or all of the 
reaction regions arc reacted with a monomer before the 
channel block must be moved or the substrate must be 
washed and/or reactivated By making use of many or all of 
the available reaction reports simultaneously, the number of 
washing and activation steps can be minimized. 

One of skill in the an will recognize that there are 
alternative methods of forming channels or otherwise pro- 
tecting a portion of the surface of the substrate. For example, 
according to some embodiments, a protective coating such 
as a hydrophilic or hydrophobic coating (depending upon 
the nature of the solvent) is utilized over portions or the 
substrate to be protected, sometimes in combination with 
materials that facilitate wetting by the reactant solution in 
other regions. In this manner, the flowing solutions are 
further prevented from passing outside of their designated 
flow paths. 

The •"spotting" methods of preparing compounds and 
libraries of the present invention can be implemented in 
much the same manner as the flow channel methods. For 
example, a monomer A can be delivered to and coupled with 
a firs; group of reaction regions which have been appropri- 
ately activated. Thereafter, a monomer B can be delivered to 
and reacted with a second group of activated reaction 
regions. Unlike the flow channel embodiments described 
above, react2flts are delivered by directly depositing (rather 
than flowing) relatively small quantities of them in selected 
regions. In some steps, of course, the entire substrate surface 
can be sprayed or otherwise coated with a solution. In 
preferred embodiments, a dispenser moves from region to 
region, depositing only as much monomer as necessary at 
each stop. Typical dispensers indudc a micropipcuc to 
deliver the monomer solution to the substrate and a robotic 
system to control the position of the micropipette with 
respect to the substrate, or an ink -jet printer. In other 
embodiments, the dispenser includes a scries of tubes, a 
manifold, an array of pipettes, or the like so thai various 
reagents can be delivered to the reaction regions simulta- 
neously. 
Pin-Based Methods 

Another method which is useful for the preparation of 
compounds and libraries of the present invention involves 
"pin based synthesis." This method is Ascribed in detail in 
U.S. PaL No. 5.288.514, previously incorporated herein by 
reference. The method utiliacs a substrate having a plurality 
of pins or other extensions. The pins are each inserted 
simultaneously into individual reagent containers in a tray. 
In a common embodiment, an array of 96 pins/containers is 
utilized. 

Each tray is filled with a particular reagent for coupling in 
a particular chemical reaction on an individual pin. Accord- 
ingly, the trays will often contain different reagents. Since 
the chemistry disclosed herein has been established such that 
a relatively similar set of reaction conditions may be utilized 
to perform each of the reactions, it becomes possible to 
conduct multiple chemical coupling steps simultaneously. In 
the first sup of the process the invention provides for the use 
of substrates) on which the chemical coupling steps are 
conducted. The substrate is optionally provided with a 
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spacer having active sites. In the particular case of oligo- 
nucleotide*. Tor example, the spacer may be selected from a 
wide variety of molecules which can be used m organic 
environments associated with synthesis as well as aqueous 
environments associated with binding studies. Examples of s 
suitable spacers are polyethylcncglycols, dicarboxylic acids, 
polyamines and alkylenes, substituted with, for example, 
mcihoxy and cthoxy groups. Additionally, the spacers will 
have an active site on the distal end. The active sites are 
optionally protected initially by protecting groups. Among a 10 
wide variety of protecting groups which are useful are 
FMOC, BOC, t-butyl esters, !-buiyl ethers, and the like. 
Various exemplary protecting groups are described in, for 
example, Atherton el al.. Solid Phase Peptide Synthesis, IRL 
Press (1989), incorporated herein by reference. In some 15 
embodiments, the spacer may provide for a cleavable func- 
tion by way of. for example, exposure to acid or base. 
Libraries on Multiple Substrates 
Bead Based Methods 

Yet another method which is useful for synthesis of 20 
compounds and libraries of the present invention involves 
"bead based synthesis.** A general approach for bead based 
synthesis is described copending application Ser. Nos. 
07/76Z322 (filed Sep. 18. 1991 now abandoned); 077946. 
239 (tiled Sep. 16. 1992); 08/146,886 (filed Nov. 2, 1993); 25 
07/876.792 (filed Apr. 29, 1992) and PCI7US93/04145 
(filed Apr. 28. 1993). the disclosures of which are incorpo- 
rated herein by reference. 

For the synthesis of molecule* such as oligonucleotides 
on beads, a large plurality of beads arc suspended in a 30 
suitable carrier (such as water) in a container. The beads arc 
provided with optional spacer molecules having an active 
site. The active site is protected by an optional protecting 
group. 

In a firsi step of the synthesis, the beads arc divided for 35 
coupling into a plurality of containers. For the purposes of 
this brief description, the number of containers will be 
limited to three, and the monomers denoted as A. B, C, D, 
£. and F. The protecting groups arc then removed and a first 
portion of the molecule to be synthesized is added to each of 40 
the three container* (i. c.. A is added to container 1, B is 
added to container 2 and C is added to container 3). 

Thereafter, the various beads arc appropriately washed of 
excess reagents, and remixed in one container. Again, it wilt 
be recognized that by virtue of the large number of beads «5 
utilized at the outset, there will similarly be a large number 
of beads randomly dispersed in the container, each having a 
particular first portion of the monomer to be synthesized on 
a surface thereof. 

Thereafter, the various beads arc again divided for cou- 50 
pling in another group of three containers. The beads in the 
first container arc dcprotcctcd and exposed to a second 
monomer (D), while the beads in the second and third 
containers arc coupled to molecule portions E and F respec- 
tively. Accordingly, molecules AD, BD. and CO will be 55 
present in the first container, while AE. BE, and CE will be 
present in the second container, and molecules AF, BF, and 
CF will be present in the third container. Each bead, how- 
ever, will have only a single type of molecule on its surface. 
Thus, all of the possible molecules formed from the first 60 
portions A, B, C, and the second portions D, E, and F have 
been formed. 

The beads arc then rccombincd into one container and 
additional steps such as are conducted to complete the 
synthesis of the polymer molecules. In a preferred cmbodi- 65 
menu the beads arc tagged with an identifying tag which is 
unique to the particular double-stranded oligonucleotide or 



probe which is present on each bead. A complete description 
of identifier tags for use in synthetic libraries is provided in 
co-pending application Ser. No. 08/146,886 (filed Nov. 2, 
1993) previously incorporated by reference for all purposes. 
Methods of Library Screening 

A library prepared according to any of the methods 
described above can be used to screen for receptors having 
high affinity for either uni molecular, double-stranded oligo- 
nucleotides or conformatioQally restricted probes. In one 
group of embodiments, a solution containing a marked 
(labelled) receptor is introduced to the library and incubated 
for a suitable period of time. The library is then washed free 
of unbound receptor and the probes or doable-stranded 
oligonucleotides having high affinity for the receptor arc 
identified by identifying those regions on the surface of the 
library where markers arc located. Suitable markers include, 
but are not limited to, radiolabels, chromophores. fluoro- 
phores, chemiluminescent moieties, and transition metals. 
Alternatively, the presence of receptors may be detected 
using a variety of other techniques, such as an assay with a 
labelled enzyme, antibody, and the like. Other techniques 
using various marker systems for detecting bound receptor 
will be readily apparent to those skilled in the art 

In a preferred embodiment, a library prepared on a single 
solid support (using, for example, the VLSIPS™ technique) 
can be exposed to a solution containing marked receptor 
such as a marked antibody. The receptor can be marked in 
any of a variety of ways, but in one embodiment marking is 
effected with a radioactive label. The marked antibody binds 
with high affinity to an immobilized antigen previously, 
localized on the surface. After washing the surface free of 
unbound receptor, the surface is placed proximate to x-ray 
film or phosphorimagcrs to identify the antigens that arc 
recognized by the antibody. Alternatively, a fluorescent 
marker may be provided and detection may be by way of a 
charge-coupled device (CCD), fluorescence microscopy or 
laser scanning. 

When autoradiography is the detection method used, the 
marker is a radioactive label, such as "P The marker on the 
surface is exposed to X-ray film or a phosphorimagcr, which 
is developed and read out on a scanner An exposure time of 
about 1 hour is typical in one embodiment. Fluorescence 
detection using a fluorophorc label, such as fluorescein, 
attached to the receptor will usually require shone r exposure 
times. 

Quantitative assays for receptor concentrations can also 
be per formed according to the present invention. In a direct 
assay method, the surface containing localized probes pre- 
pared as described above, is incubated with a solution 
containing a marked receptor for a suitable period of time. 
The surface is then washed free of unbound receptor. The 
amount of marker present at predefined regions of the 
surface is then measured and can be related to the amount of 
receptor in solution. Methods and conditions for performing 
such assays arc well-known and arc presented in, for 
example. L. Hood ct al.. Immunology, Benjamin/Cum mi ngs 
(1978). and E. Harlow ct aJ.. Antibodies. A Laboratory 
Manual, Cold Spring Harbor Laboratory, (1988). Sec, also 
U.S. Pat No. 4^76,1 10 for methods of performing sandwich 
assay 5. The precise conditions for performing these steps 
will be apparent to one skilled in the an. 

A competitive assay method for two receptors can also be 
employed using the present invention. Methods of conduct- 
ing competitive assays arc known to those of skill in the art. 
One such method involves immobilizing conformational ly 
restricted probes on predefined regions of a surface as 
described above. An unmarked first receptor is then bound 
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to the probes on the surface having a known specific binding ing incubation with the unlabeled protein, the. library will be 

affinity for the receptors. A solution containing a marked treated with DNase I and examined for areas which are 

second receptor is then introduced to the surface and incu- protected from cleavage. 

baled for a suitable timr. The surface is then washed free of The assay methods described above for the libraries of the 

unbound reagents and the amount of marker remaining on 5 present invention can also be used in reverse drug discovery, 

the surface is measured. In another form of competition In such an application, a compound having known priarma- 

assay, marked and unmarked receptors can be exposed to the cological safety or other desired properties (e.g., aspirin) 

surface simultaneously. The amount of inaTker remaining on could be screened against a variety of double- stranded 

predefined regions of the surface can be related to the oligonucleotides for potential binding. If the compound is 

amount of unknown receptor in solution. Yet another form of 10 shown to bind to a sequence avso ci med with, for example, 

competition assay will utilize two receptors having different tumor suppression, the compound can be further examined 

labels, for example, two different chromophores. for efficacy in the related diseases. 

In other embodiments, in order to detect receptor binding, In other embodiments, probe arrays comprising p-turo 

the double-stranded oligonucleotides which are formed with mirr.eucs can be prepared and assayed for activity against a 

attached probes or with a flexible linking group will be 15 particular receptor. p-tum mimwira are compounds having 

treated with an intercalating dye, preferably a fluorescent molecular structures similar to (Mums which are one of the 

dye. The library can be scanned to establish a background three major components in protein molecular architecture, 

fluorescence After exposure of the library to a receptor p-rums are similar in concept to hairpin aims of oligorrucle- 

solution, the exposed library will be scanned or illuminated otide strands, and are often critical recognition features for 

and examined for those areas in which fluorescence has zo various protein-ligand and protein-protein interactions. As a 

changed. Alternatively, the receptor of interest can be result, a library or |5-ium mimetic probes can provide or 

labeled with a fluorescent dye by methods known to those of suggest new therapeutic agents having a particular affinity 

skill in the art and incubated with the library of probes. The for & receptor which will correspond to the affinity exhibited 

library can then be scanned or illuminated, as above, and by the (J-tum and its receptor, 

examined for areas of fluorescence. 15 Bioelccuonic Devices and Methods 

In instances where the libraries are synthesized on beads In another aspect, the present invention provides a method 

in a number of containers, the beads are exposed to a for the bioelectronic detection of sequence-specific oligo- 

rcccptor of interest. In a preferred embodiment the receptor nucleotide hybridization. A general method and device 

is fluorescently or radioactively labelled. Thereafter, one or which is useful in diagnostics in which a biochemical 

more beads are identified that exhibit significant levels of, JO species is attached to the surface of a sensor is described in 

for example, ftuonscence using one of a variety of tech- US. Pat. No. 4,562,157 (the Lowe patent), incorporated 

niqucs. For example, in one embodiment, mechanical scpa- herein by reference. The present method utilizes arrays of 

ration under a microscope is utilized. The identity of the immobilized oligonucleotides (prepared, for example, using 

molecule on the surface of such separated beads is then VLSIPS™ technology) and the known photo-induced etec- 

idcniificd using, for example, NMR, mass spectrometry, 35 tron transfer which is mediated by a DNA double helix 

PCR amplification and sequencing of the associated DNA. structure. See, Murphy ct al.. Science 262:1025-1029 

or the like. In another embodiment, automated sorting (i.e.. (1993). This method is useful in hybridizationbascd diag- 

fiuorcsccncc activated cell sorting) can be used to separate nostics, as a replacement for fluorescence-based detection 

beads (bearing probes) which bind to receptors from those systems. The method or bioelectronic detection also offers 

which do not bind. Typically the beads wilt be labeled and 40 higher resolution and potentially higher sensitivity than 

identified by methods disclosed in Nccdcls, et al., Proc. earlier diagnostic methods involving sequencing/detecting 

Natl Acad. ScU USA 90:10700-10704 (1993). incorporated by hybridization. As a result, this method finds applications 

herein by reference in genetic mutation screening and primary sequencing of 

The assay methods described above for the libraries of the oligonucleotides. The method can also be used for Scqucnc- 

prcscni invention will have tremendous application in such 45 ing By Hybridization (SBH), which is described in co- 

endeavors as DNA M footprinting M of proteins which bind pending application Scr. Nos. 08/082,937 (filed Jun. 25. 

DNA. Currently. DNA footprinling is conducted using 1993 now abandoned) and 08/168.904 (filed Dec. 15, 1993). 

DNase I digestion or double-stranded DNA in the presence each of which are incorporated herein by reference for all 

of a putative DNA binding protein. Gel analysis of cut and purposes. This method uses a set of short oligonucleotide 

protected DNA fragments then provides a 'Tootprint" or 50 probes of defined sequence to search for complementary 

where the protein contacts the DNA. this method is both sequences on a longer large; strand of DNA. The hybrid- 

labor and time intensive Sec. Galas till.. Nucleic Acid Res. ization pattern is used to reconstruct the target DNA 

53157 (1978). Using the above methods, a "footprint" could sequence. Thus, the hybridization analysis oflarge numbers 

be produced using a single array of unimolccular, double- of probes can be used to sequence long stretches of DNA. in 

stranded oligonucleotides in a fraction of the time of con- 55 immediate applications of this hybridization methodology, a 

vcntional methods. Typically, the protein will be labeled small number of probes can be used lo interrogate local 

with a radioactive or fluorescent species and incubated with DNA sequence. 

a library of unimolecular, double-stranded DNA. Phospho- In the present inventive method, hybridization is moni- 

rimaging or fluorescence detection will provide a footprint torcd using bioelectronic detection. In this method, the target 

of those regions on the library where the protein has bound. 60 DNA. or first oligonucleotide, is provided with an clcctron- 

Altematively. unlabeled protein can be used. When unla- donor tag and then incubated with an array of oligonucle- 

bclcd protein is used, the double-stranded oligonucleotides otide probes, each of which bears an electron-acceptor tag 

in the library will all be labeled with a marker, typically a and. occupies a known position on the surface of the amy. 

fluorescent marker. Incorporation of a marker into each After hybridization of the first oligonucleotide to the array 
member of the library can be carried out by terminating the 65 has occurred, the hybridized array is illuminated to induce 
oligonucleotide synthesis with a commercially available an electron transfer reaction in the direction of the surface of 
fluorescing phosphoramidite nucleotide derivative. Follow- the array. The electron transfer reaction is then delected al 
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the location on the surface where hybridization has taken 
place. Typically, each of the oligonucleotide probes in an 
amy will have an auached electron-acceptor tag located 
near the surface of the solid support used in preparation of 
the array. In embodiments yn which the arrays are prepared 5 
by light-directed methods <i.c t typically 3' to 5 1 direction), 
the elcctronacccpior lag will be located near the 3' position. 
The electron-acceptor tag can be attached either to the 3' 
monomer by methods known to those of skill in the an, or 
it can be auached to a spacing group between the 3' to 
monomer and the solid support. Such a spacing group will 
have, in addition to functional groups for attachment to the 
solid support and the oligonucleotide, a third functional 
group for attachment of the elcaronacccptor tag. The target 
oligonucleotide will typically have the electron-donor tag 15 
attached at the 3' position. Alternatively, the target oligo- 
nucleotide can be incubated with the array in the absence of 
an electron-donor tag. Following incubation, the electron- 
donor tag can be added in solution. The electron-otanor tag 
will then intercalate into those regions where hybridization 20 
has occurred. An electron transfer reaction can then be 
delected in those regions having a continuous DNA double 
helix. 

The electron-donor lag can be any of a variety of com- 
plexes which participate in electron transfer reactions and 25 
which can be attached to an oligonucleotide by a means 
which docs noi interfere with the electron transfer reaction. 
In preferred embodiments, the electron-donor tag is a ruthe- 
nium (II) complex, more preferably a ruthenium (11) 
(phcn'^tdppz) complex. 30 

The clccu-on-acccptor lag can be any species which, with 
the electron-donor tag, will participate in an electron transfer 
reaction. An example of an electron-accepter tag is a 
rhodium (III) complex. A preferred electron-acceptor tag is 
a rhodium (III) (phi)j(phcn*) complex. 35 

In a particularly preferred embodiment, the electron- 
donor tag is a ruthenium (H) (phen'^dpe*) complex and the 
electron- acceptor tag is a rhodium (III) <phi) 3 (phen) com- 
plex. 

In still another aspect, the present invention provides a 40 
device for the bioclcctronic detection of sequence-specific 
oligonucleotide hybridization. The device will typically con- 
sist of a sensor having a surface to which an array of 
oligonucleotides arc attached. The oligonucleotides will be 
attached in pre-defined areas on the surface of the sensor and 45 
have an electron-acceptor lag attached to each oligonucle- 
otide. The electron-acceptor tag wilt be a lag which is 
capable of producing an electron transfer signal upon illu- 
minate- n of a hybridized species, when the complementary 
oligonucleotide bears an electron donating tag. The signal 50 
will be in the direction of the sensor surface and be detected 
by the sensor. 

In a preferred embodiment, the sensor surface will be a 
silicon -based surface which can sense the electronic. signal 
induced and, if necessary, amplify the signal. The metal 
contacts on which the probes will be synthesized can be 
treated wiih an oxygen plasma prior to synthesis of the 
probes to enhance the silane adhesion and concentration on 
the surface The surface will further comprise a multi-gated 
field effect transistor, with each gate serving as a sensor and 
different oligonucleotide* auached to each gate. The oligo- 
nucleotides will typically be attached to the metal contacts 
on the sensor surface by means of a spacer group. 

The spacer group should noi be too long, io order to 
ensure thai the sensing function of the device is easily 
activated by the binding interaction and subsequent illumi- 
nation of the "tagged" hybridized oligonucleotides. Prcfcr- 
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ably, the spacer group is from 3 to 12 atoms in length and 
will be as described above for the surface modifying portion 
of the spacer group. L 1 . 

The oligonucleotides which are attached to the spacer 
group can be formed by any of the solid phase techniques 
which are known to those of skill in the an. Preferably, the 
oligonucleotides are formed one base at a time in the 
direction of the 3' terminus to the 5' terminus by the 
"light-directed" methods described above The oligonuclc- 
odd: can then be modified at the 3' end to attach the 
electron-acceptor tag. A number of suitable methods of 
attachment are known. For example, modification with the 
reagent Aminolink2 (from Applied Biosystcms, Inc.) pro- 
vides a terminal phosphate moiety which is deri vatized with 
an aminohcxyl phosphate ester. Coupling of a carboxylic 
add, which is present on the electron-acceptor tag. to the 
amine can then be carried out using HOBT and DCC. 
Alternatively, synthesis of the oligonucleotide can begin 
with a suitably derivaiizcd and protected monomer which 
can then be dcprotcctcd and coupled to the electron-acceptor 
tag once the complete oligonucleotide has been synthesized. 

The silica surface can also be replaced by silicon nitride 
or oxyniiridc, or by an oxide of another metal, especially 
aluminum, titanium (IV) or iron (III). The surface can also 
be any other film, membrane, insulator or semiconductor 
overlying the sensor which will not interfere with the 
detection of electron transfer detection and to which an 
oligonucleotide can be coupled. 

Additionally, detection devices other than an FET can be 
. used. For example, sensors such as bipolar transistors, MOS 
transistors and the like arc also useful for the detection of 
electron transfer signals. 
Adhcsivcs 

In still another aspect, the present invention provides an 
adhesive comprising a pair of surfaces, each having a 
plurality of attached oligonucleotides, wherein the single* 
stranded oligonucleotides on one surface arc complementary 
to the singlc-sirandcd oligonucleotides on the other surface. 
The sucngth and position/orientation specificity can be 
controlled using a number of factors including the number 
and length of oligonucleotides on each surface, the degree of 
complementary, and the spatial arrangement of complemen- 
tary oligonucleotides on the surface. For example, increas- 
ing the number and length of the oligonucleotides on each 
surface will provide a stronger adhesive. Suitable lengths of 
oligonucleotides arc typically from about 10 to about 70 
nucleotides. Additionally, the surfaces of oligonucleotides 
can be prepared such that adhesion occurs in an extremely 
position-specific manner by a suitable arrangement of 
complementary oligonucleotides in a specific pattern. Small 
deviations from the optimum spatial arrangement arc ener- 
getically unfavorable as many hybridization bonds must be 
broken and arc not reformed in any other relative orienta- 
tion. 

The adhcsivcs of the present invention will find use in 
numerous applications. Generally, the adhcsivcs are useful 
for adhering two surfaces to one another. More specifically, 
the adhesives will find application where biological com- 
patibility of the adhesive is desired. An example of a 
biological application involves use in surgical procedures 
where tissues must be held in fixed positions during or 
following the procedure. In this application, the surfaces of 
the adhesive will typically be membranes which arc com- 
patible with the tissues to which they arc auached. 

A particular advantage of the adhesives of the present 
invention is that when they arc formed in an orientation 
sped 6c manner, the adhesive portions will be "self- finding," 
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thai is the system will go to the thermodynamic equilibrium 
in which the two sides are matched in the predetermined, 
orientation specific manner. 

EXAMPLES s 

Example 1 

This example illustrates the general synthesis of an array 
of unimolecular, double-stranded oligonucleotides on a solid 1Q 
support. 

Unimolecular double stranded DNA molecules were syn- 
thesized on s solid support using standard light-directed 
methods (VLSIPS™ protocols). Two hexacthylenc glycol 
(PEG) linkers were used to covalently attach the synthesized I3 
oligonucleotides to the derivatized glass surface. Synthesis 
of the first (inner) strand proceeded one nucleotide at a time 
using repeated cycles of photo-deprotecuon and chemical 
coupling of protected nucleotides. The nucleotides each had 
a protecting group on the base portion of the monomer as a> 
well as a photolabite MeNPoc protecting group on the 5" 
hydroxyl. Upon completion of the inner strand another 
MeNPoc-protecied PEG linker was covalently attached to 
the 5 ! end of the surface-bound oligonucleotide. After addi- 
tion of the internal PEG linker, the PEG is photodeprotected, 23 
and the synthesis of the second strand pro c e eded in the 
normal fashion. Following the synthesis cycles, the DNA 
bases were deprotected using standard protocols. The 
sequence of the second (outer) strand, being complementary 
to that . of the inner strand, provided molecules with short, 30 
hydrogen bonded, unimolecular double-stranded structure 
as a result of the presence of the internal flexible PEG linker. 

An array of 16 different molecules were synthesized 00 a 
derivatized glass slide in order 10 determine whether short, 
unimolecular DNA structures could be formed on a surface 35 
and whether they could adopt structures that arc recognized 
by proteins. Each of the 16 different molecular species 
occupies a different physical region on the glass surface so 
that there is a onc-to-onc correspondence between molecular 
identity and physical location. The molecules are of the form <o 

S-P-P-C-C-A/T-An*-A/T.A/T-G-C-P-G-C-A/T.A/T-A/T- 
A/T-G-G-F 

where S is the solid surface having silyl groups, P is a PEG 
linker, A. C. G. and T are the DNA nucleotides, and F is a 
fluorescent lag. The DNA sequence is listed from the 3' to 43 
the 5' end (the 3' end of the DNA molecule is attached 10 ih: 
solid surface via a silyl group and 2 PEG linkers). Th: 
sixteen molecules synthesized on the solid support differed 
in the various permutations of A and T in the above formula. 

Example 2 

This example illustrates the ability of a library of surface- 
bound, unimolecular, double-stranded oligonucleotides to 
exist in duplex form and to be recognized and bound by a 3j 
protein. 

A library of 16 different members was prepared as 
described in Example 1. The 16 molecules all have the same 
composition (same number of As, Cs, Gs and Ts). but the 
order is different. Four of the molecules have an outer strand 60 
that is 100% complementary to the inner strand (these 
molecules will be referred to as DS, doublestranded, below). 
One of the four DS oligonucleotides has a sequence that is 
recognized by the restriction enzyme EcoRl. If the molecule 
can loop back and form a DNA duplex, it should be 63 
recognized and cut by the restriction enzyme, thereby releas- 
ing the fluorescent tag. Thus, the action of the enzyme 



provided a functional test for DNA structure, and also served 
to demonstrate that these structures can be recognized at the 
surface by proteins. The remaining 12 molecules had outer 
strands that were not complementary to their inner strands 
(referred to as SS, single-stranded, below). Of these, three 
had an outer strand and three had an inner strand whose 
sequence was an EcoRl half-site (the sequence on one 
strand was correct for the enzyme, but the other half was 
not). The solid support with an array of molecules on the 
surface is referred to as a "chip" for the purposes of the 
following discussion. The presence of fluorescently labelled 
molecules on the chip was detected using confocal fluores- 
cence microscopy. The action of various enzymes was 
determined by monitoring the change in the amount of 
fluorescence from the molecules on the chip surface (e.g. 
"reading" the chip) upon treatment with enzymes that can 
cut the DNA and release the fluorescent tag at the 5* end 
The three different enzymes used to characterize the 
structure of the molecules on the chip were: 

1) Mung Bean Nuclease — sequence independent, single- 
strand specific DNA endonudease; 

2) DNasc I — sequence independent, double-strand spe- 
cific endonudease; 

3) EcoRl— restriction endonudease that recognizes the 
se quenc e (S^ 1 ) 

CAATTC in double stranded DNA, and cuts between the 
G and the first A. Mung Bean Nuclease and EcoRl were 
obtained from New England Biolabs, and DNase 1 was 
obtained from Boehringcr Mannheim. All enzymes were 
used at a concentration of 200 units per mL in the. buffer 
recommended by the manufacturer. The enzymatic reactions 
were performed in a 1 mL flow cell at 22 B C.. and were 
typically allowed to proceed for 90 minutes. 

Upon treatment of the chip with the enzyme EcoRl, the 
fluorescence signal in the DS EcoRl region and the 3 SS 
regions with the EcoRl half-site on the outer strand was 
reduced by about 1 0% of its initial value. This reduction was 
at least 5 times greater than for the other regions of the chip, 
indicating that the action of the enzyme is sequence specific 
on the chip. It was not possible to determine if the factor is 
greater than 5 in these preliminary experiments because of 
uncertainty in the constancy of the fluorescence background. 
However, because the purpose of these early experiments 
was 10 determine whether unimolecular double-stranded 
structures could be formed and whether they could be 
specifically recognized by proteins (and not to provide a 
quantitative measure of enzyme specifidty), qualitative dif- 
ferences between the different synthesis regions were suf- 
ficient. 

The reduction in signal in the 3 SS regions with the EcoRl 
half-site on the outer strand indicated cither that the enzyme 
cuts single-stranded DNA with a particular sequence, or that 
these molecules formed a double-stranded structure that was 
recognized by the enzyme. The molecules on the chip 
surface were at a relatively high density, with an average 
spacing of approximately 100 angstroms. Thus, it was 
possible for the outer strand of one molecule to form a 
double-stranded structure with the outer strand of a neigh- 
boring molecule. In the case of the 3 SS regions with the 
EcoRl half-site on the outer strand, such a biraolecular 
double-stranded region would have the correct sequence and 
structure to be recognized by EcoRl. However, it would 
differ from the unimolecular doublc-sirandcd molecules in 
that the inner strand remains single-stranded and thus ame- 
nable to cleavage by a single-strand specific endonudease 
such as Mung Bean Nudease. Therefore, it was possible to 
distinguish unimolecular from bi molecular double-stranded 
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DNA molecules on the surface by their ability to be cut by 
single and double-strand specific cndoaucleases. . 

In order to remove all molecules that have single-stranded 
structures and to identify uni molecular double-stranded 
molecules, the chip was first exhaustively treated with Mung 5 
Bean Nuclease. The reduction in the fluorescence signal was 
greater by about a factor of 2 for the SS regions of the chip, 
including those with the EcoRl half-site on the outer strand 
that were cleaved by EcoRl, than for the 4 DS regions. 
Following Mung Bean Nuclease treatment, the chip was 10 
treated with either DNasc I (which cuts all remaining 
double-stranded molecules) or EcoRl (which should cut 
only the remaining double*stranded molecules with the 
correct sequence). Upon treatment with DNase I, the fluo- 
rescence signal in the 4 DS regions was reduced by at least \s 
5-fold more than the signal in the SS regions. Upon EcoRl 
treatment, the signal in the single DS region with the correct 
EcoRl sequence was reduced by at least a factor of 3 more 
than the lignal in any other region on the chip. Taken 
together, these results indicated thai the surface-bound mot- 20 
ccules synthesized with two complementary strands sepa- 
rated by a flexible PEG linker form intramolecular double- 
stranded structures that were resistant to a single-strand 
specific cndonucleasc and were recognized by both a 
double-strand specific endonuclease, and a sequence-spe- 25 
cific restriction enzyme. 
What is claimed is: 

1. A synthetic unimoiecuiar, double-stranded oligonucle- 
otide library comprising a plurality of different members, 
each member having the formula: 



Y— X 1 — X* 



wherein, 
Y is a solid support; 

X 1 and X 3 arc a pair of complementary oligonucleotides; 
L 1 is a spacer; 

L 3 is a linking group having sufficient length such that X 1 
and X 3 form a double-stranded oligonucleotide. 

2. A library in accordance with claim 1. wherein L 3 is a 
polyethylene glycol group. 

3. A library in accordance with claim 1, wherein X 1 and 
X 3 are complementary oligonucleotides each comprising of 
from 6 to 30 nucleic acid monomers. 

4. A library in accordance with claim 1, wherein said solid 
support is a silica support and L 1 comprises an aminoalkyl- 
silane and from 1 to 4 nexaeihyleneglycols. 

5. A library in accordance with claim 1. wherein said solid 
support is a silica support, L 1 comprises an aminoalkylsUane 
and from 1 to 4 hexaethyteneglycols, L 3 is a polycthylcncg- 
lycol group and X 1 and X 3 arc complementary oligonucle- 
otides each comprising of from 6 to 30 nucleic acid mono- 
mers. 

6. A synthetic unimoiecuiar, double-stranded oh go nucle- 
otide library of daim 1, wherein a portion of said double- 
stranded oligonucleotides formed by X 1 and X 3 further 
comprise a loop. 
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ADD AVC nT , vtinrprAT <; ATTACHED TO A flow of fluids from the reactor system, selectively activating 

^^T^^^ the translation stage, and selectively ffluminating the sub- 

strate so as to fount a plurality of diverse polymer sequences 
CROSS REFERENCE TO RELATED . on the substrate at predetermined locations. 

APPLICATIONS 3 The invention also provides a technique for selection of 

linVw- molecules in a very large scale immobilized polymer 
This application is a division of VS. patent application (VLSEPS™) method. According to this aspect of 

Sex. No. 08/390.272. filed Feb. 16. 1995. now US. Pat. No. ^ tfac ^^00 provides a method of screening 

5.489.678. which is a continuation of VS. patent appUcation fl rfwtiity of KnWr polymers for use in binding affinity 
Ser. No. 07/624.120, filed Dec 6. 1990. now abandoned, JQ ^ mventioo mc steps of forming a 

which is a continuation-in-part of U.S- patent application _ lurality ^ VtnV „ polymers on a substrate in selected 
Ser. No. 07/492.462. filed Mar. 7. 1990. now VS. Pat. No. ^ n n Vrr polymers fanned by the steps of recur- 

5.143.854, which is a continuation-in-part of U^. patent oq & 5Urfacc of a sn bstrate, irradiating a portion of the 

application Ser. No. 07/362.901, filed Jnn- 7^1989, now Mlcctcd rcg j ons to remove a protective group, and contact- 
abandoned, and hereby incorporated herein by reference for 15 ^rh a monomer, contacting the plurality of 
all purposes. This application is also a continuation-in-part i;nWr ^tonm with a hgand; and contacting the ligand with 
of VS. patent application Ser. No. 08/456.887, filed Jan. 1, & ng^. 

1995, which is a division of U.S. patent f^*** 0 ? Set No. AcCQrdin g to another aspect of the invention, improved 
07/954.646. filed Sep. 30, 1992. now U^. Pat Na 5.445, otorei ^ blc protective groups are provided- According 
934, which is a division of VS. patent application Ser. No. w £ invention a compound having the 

07/850356. filed Mar. 12, 1992, now U.S. PaL No 5,405, f nfTnnU . *~ 
783 . which is a division of VS. patent application Ser. No. 

07/492.462, filed Mar. 7, 1990, now VS. PaL Set No, R , ^ 

5.143.854, which is a continiiation-in-part of VS. patent o . V 

application Ser. No. 07/362,901 filed Jan. 7, 1989. now ^ J^o-^f^Y^^T 
abandoned. Y^V / . I M I 

This appHcation is also related to VS. patent appUcation OMc 
Set No. 08/670.118 filed Jun. 25, 1996, which is a division T 
of U.S. patent appUcation Set No. 08/168,104, filed Dec 15, ™* 
1993, which is a continuation of U.S. patent appUcation Set x 

No. 07/624,114. filed Dec 6, 1990, now abandoned, and wherein n=0 or 1; Yis selected from the group consisting of 
VS. patent application Set No. 07/626.730, filed Dec 6. ^ oxygen of the carboxyl group of a natural or unnatural 
1990 now U.S. Pat No. 5347.839, and also incorporated . amino acid, an amino group of a natural or unnatural amino 
herein by reference for all purposes. . add, or the C-5' oxygen group of a natural or iinnamral 

35 o^oxyribonucleic or ribonucleic acid; R and R indcpen- 
COPYRIGHT NOTICE dentry are a hydrogen atom, a lower alkyl aryL benzyL 

rS^:S^« ^^^.^t^hy^ or aU.nyl group 

reproductioa by anyone °f ^^' do ^ " " Creation aUo provides improved masking tech- 

disclosure as it appe-sm *e Pa£* ^ ^ tbe VlW^oZ^^ccorfing to one 

patent file or records, bat otherwise reserves all copyright Ji rf ^ me invention provides an 

rights whatsoever. ordered method for forming a plurality of polymer 

BACKGROUND OF THE INVENTION 45 sequences by sequential addition of reagents comprising the 

BAUkUKUunuunuB step of serially protecting and deprotecang portions of the 

The present invention relates to the field of polymer plurality of polymer sequences for addition of other portions 

synthesis. More specifically, the invention provides i reactor ^ ^ ^^y^^ sequences using a binary synthesis strategy, 

system, a masking strategy, photoremovable protective improved data collection equipment and techniques are 

groups, data collection and processing techniques, and appli- x ^ j^yy^ According to one embodiment, the instru- 

catioos for light directed synthesis of diverse polymer mentation provides a system for determinin g affinity of a 

sequences on substrates. receptor to a ligand comprising: means for applying light to 

^.vt,™™™ » nnfsee of a substrate, the sabstiate compriimg a ptarality 

SUMMARY OF THE INVENTION of ligaods at predctennined locations, the means for provid- 

Methods tpparams. and cotra?osilions for synthesis and 35 ing simultaneous ffl rrmiaarin n at a plurality of the prcdetor- 

nse of diverse polymer sequences on a substrate ate mined locarioBS ; and an array of detectors for detecting light 

fisdosed. as well «^cafions thereof. fluoresced at the plurality of redetermined locations. The 

According to one aspect of the invention, an improved invention further provides for taproved data inalyiutech- 

rea^s^fo^synE of diverse polymer sequences niques including the steps of e^oamg fli»or«cenuy labdkd 

^T^b^haJ^AccorOmg to thus embodiment the so receptors to a substrate, the substrate comprising a ptarahty 

"ventn^d^r^^ of ligands in regions atjoiown l<^ons^ a plura^o 

toa^t^systemfcxdeUveringsdectedre^cc^ data collection 

to the reactor, a translation stage for moving a mask or ing an amount of light fiixxrsced from ^J^""f*°° 

substrate from at least a first relative location relative to a points; removing the datt coUecaon points aevutng from a 
lecort relative location; a light for illuminaring the substrate « predetermined statistical di^don: and de^ung a 

tt^ghTmask at select times; and an appropriately relative binding affinity of the receptor to ramming data 

programmed digital computer for selectively directing a collection potnti. 
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Protected amino add N-carboxy anhydrides for use in 
polymer synthesis are also disdosed According to this 
aspect, the invention provides a compound having the for- 
mula: 



o 



35 



where R is a side chain of a natural or unnatural amino acid 
and X is a pbotoremovable protecting group. 15 

A further understanding of the nature and advantages of 
the inventions herein may be realized by reference to the 
remaining portions of the specification and the attached 
drawings. 

20 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 schematically illustrates light-directed spatially- 
addressable parallel chemical synthesis; 

FIG. 2 schematically illustrates one example of light- ^ 
directed peptide synthesis; 

FIG, 3 is a three^dimensioiial representation of a portion 
of the chectaboard array of YGGFL and[PGGFL; 

FIG. 4 schematically illustrates an automated system for 
synthesizing diverse polymer sequences; 30 

FIGS. Sa and 5b illustrate operation of a program for 
polymer sythesis; 

FIGS, fa and to are a schematic illustration of a "pure** 
binary T"«« H "g strategy; 

FIGS. 7a and 76 are a schematic illustration of a gray code 
binary weiring strategy; 

FIGS. 8a and %b are a schematic illustration of a modified 
gray code binary masking strategy; 

FIG. 9a schematically illustrates a masking s ch e m e for a 40 
four step synthesis; 

FIG. 96 schematically illustrates synthesis of all 400 
peptide dimers; 

FIG. It is a coordinate map for the ten-step binary ■ . 
synthesis; 

FIG. 11 schematically illustrates a data collection system; 

FIG. 12 is a block diagram illustrating the architecture of 
the data coUectioo system; 

FIG. 13 is • flow chart illustrating operation of software 50 
for the data collection/analysis system; and 

FIG. 14 illustrates a three-dimensional plot of intensity 
versus position for light directed synthesis of a dinudectide, 

DESCRIPTION OF THE PREFERRED 35 
EMBODIMENTS 

CONTENTS 

L Definitions 
IL General 
Deprotecdon and Addition 

1. En ample 

2. Example 
B. Antibody recognition 

1. Example 
10. Synthesis 
A. Reactor System 



B. Binary Synthesis Strategy 

1. Example 

2. Example 

3. Example 
5 4. Example 

5. Example 

6. Example 

C Linker Selection 

D. Protecting Groups 

10 1. Use of Photoremovable Groups During Splid-Phase 
Synthesis of Peptides 
Z Use of Photoremovable Groups During Solid-Phase 
Synthesis of Oligonucleotides 

E. Amino Acid N-Carboxy Anhydrides Protected with a 
Photoremovable Group 

IV. Data Collection 

A. Data Collection System 

B. Data Analysis 

V. Other Representative Applications 
A. Oligonucleotide Synthesis 
1. Example 

VL Conclusion 

I DEFINITIONS 
Certain TT"« used herein arc intended to have the fol- 
lowing general defim'tioas: 

1. Complementary: 
Refers to the topological compatibility or matching 

together of interacting surfaces of a ligand molecule and its 
receptee. Thus, the receptor and its ligand can be described 
as complementary, and furthermore, the contact surface 
characteristics am complementary to each other. 

2. Epitope: \ 
The portion of an antigen molecule which is delineated by 

the area of interaction with the subclass of receptors known 

as antibodies. 

3. Ligand: 

A ligand is a molecule that is recognized by a particular 
receptor: Examples of ligands that can be investigated by 
this invention irr lnA v but arc not restricted to, agonists and 
antagonists for cell membrane receptors, toxins and venoms, 
viral epitopes, hormones, hormone receptors, peptides, 
enzymes, enzyme substrates, cefaclors, drugs (e.g. opiates, 
steriods, etc.)* lectins, sugars, oligonucleotides, nucleic 
acids, oligosaccharides, proteins, and monoclonal antibod- 
ies. 

4. Monomer. 

A member of the set of small molecules which can be 
joined together to form a polymer. The set of monomers 
includes but is not restricted to, for example, the set of 
common L-amino acids, the set of D-amino adds, the set of 
synthetic amino an***, the set of nucleotides and the set of 
pentoses and hexoses. As used herein, monomers refers to 
any member of a basis set for synthesis of a polymer. For 
example, dimers of the 20 naturally omirring L-amino adds 
forma basis set of 400 monomers for synthesis of polypep- 
tides. Different basis sets of monomers may be used at 
successive steps in the synthesis of a polymer. Furthermore, 
60 c urb of the sets may indude protected members which are 
modified after synthesis. 

5. Peptide: 

A polymer in which the monomers are alpha amino adds 
and which are joined together through amide bonds and 
65 alternatrvdy referred to as a polypeptide. In the context-of 
this specification it should be appreciated that the amino 
acids may be the L-optical isomer or the D-optical isomer. 
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Feotides art often two or more amino acid monomers long. e) Catalytic Polypeptides: 

SS^SS^ 2*amiflo add monomers long. Stan- Polymers, preferably poiypepudes. which are capable 

tol ato^ons for amino acids are used (e.g.. P for of promoting a chemical reaction involving the coo- 

iT^rabbreviatioDS are included in Stryer. version of one or more reaoants to one or more 

3 products. Such polypeptides generally include a 

£ rcrerenci for all purposes. b***g site specific for at least one reacunt or 

6 Radiation: reaction intermediate and an active functionality 

" Energy which may be selectively applied indudine proximate to the binding site, which functionality is 

energy having a wavelength of between 10~ 14 and 10* capable of chemically modifying the bound rtacunL 

meters inducting, for example, dectron beam radiation. 1Q Catalytic polypeptides are described in. for example, 

gamma radiation, x-ray radiation, ultraviolet radiation, vis- U.S. Pat. No. 5.215,899, which is incorporated 

ible tight infrared radiation, microwave radiation, and radio herein by reference for all purposes, 

waves. "Irradiation" refers to the application of radiation to q Hormone receptors: 

a surface. Examples of horznoaes receptors indude, eg., the 

7. Receptor: 13 receptors for insulin and growth hormone. Determi- 

A molecule that has an affinity far a given ligandL Recep- nation of the ligands which bind with high affinity to 

tors may-be namrany-ccevning or manmade molecules. a receptor is useful in the devdopment of, for 

Also, they can be employed in their unaltered state or as example, an oral replacement of the daily injections 

aggregates with other species. Receptors may be attached. which diabetics must take to relieve the symptoms of 

covalently or noucovalcntly, to a binding member, either ^ diabetes, and in the other case, a replacement for the 

directly or via a specific binding substance. Examples of scarce human growth hormone which can only be 

receptors which can be employed by this invention include, obtained from cadavers or by r ec o mbin an t DNA 

bat are not restricted to, antibodies, cell membrane technology. Other examples are the vasoconstrictive 

receptors, monodonal antibodies and antiscra reactive with hormone receptors; determination of those ligands 

specific antigenic determinants (such as on viruses, cells or ^ which bind to a receptor may lead to the devdop- 

other materials), drugs, polynucleotides, nuckic acids, mem of drugs to control blood pressure, 

peptides, cofactors. lectins, sugars, polysaccharides, cells, g) Opiate receptors: 

cellular membranes, and organdies. Receptors are some- Determination of ligands which bind to the opiate 

times referred to in the art as anti-ligands. As the term receptors in the brain is useful in the development of 

receptors is used herein, no difference in meaning is x less-addictive replacements for morphine and related 

intended. A Tigand Receptor Pair" is formed when two drugs, 

macromolecules have combined through molecular recog- g Substrate: 

nitioo to form a complex. Other examples of receptors which A material having a rigid or semi-rigid surface, In many 

can be investigated by this invention indude but are not cmrxxhxnents, at least one surface of the substrate will be 

restricted to: . 35 substantially flat, although in some embo d i men ts it may be 

a) Microorqanism receptors: desirable to physically separate synthesis regions for differ- 
Determination of ligands which bind to receptors, such cnt polymers with, for example, wells, raised regions, etched 

as specific transport proteins or enzymes essential to trenches, or the like According to other embodiments, small 

survival of microorganisms, is useful in developing beads may be provided on the surface which may be released 

a new rU« of antibiotics. Of particular value would ^ U p 0n completion of the synthesis, 

be antibiotics against opportunistic fungi, protozoa, 9 Protective Group: 

and those bacteria resistant to the antibiotics in A material which is chemically bound to a monomer unit 

current use. and which may be removed upon sdective exposure to an 

b) Enzymes: activator such as dedromagnetic radiation. Examples of 
For instance, one type of receptor is the binding site of 43 protective groups with utility herein indude those compris- 

enzymcs such as the enzymes responsible for deav- ing nitropiperonyL pyrenyimethoxy-<arbonyl nitroveratryL 

ing neurotransmitters; determination of ligands nitrobenzyl, dimethyl dimethoxybenzyl, 5-bromo-7- 

which bind to certain receptors to modulate the nitroindolinyl, o-hydroxy-a-methyl cinnamoyl, and 

action of the enzymes which deave the different 2-oxymetfaviene anthraqinnone. 

neurotransmitters is useful in the devdopment of 50 10. Predefined Region: 

drugs which can be used in the treatment of disorders A predefined region is a localized area on a surface which 

of neurccatismission. is. was, or is intended to be activated for formation of a 

c) Antibodies* polymer. The predefined region may have any convenient 
For instance, the invention may be useful in investi- shape, eg., circular, rectangular, elliptical wedge-shaped, 

giting the ligandVbwding site on the antibody moi- 35 etc. For the sake of brevity herein. 44 predefined regions are 

ecule which combines with the epitope of an antigen s ometim es referred to simply as "regions.** 

of interest; determining a sequence that mimics an 11. Substantially Pure: 

antigenic epitope may lead to the-devdopment of Aporymer is considered to be "substantially pure" Jrtthin 

vaccines of which the immunogen is based on one or a predefined region of a substrate when it exhibits charac- 

more of such sequences or lead to the devdopment 60 teristics that distinguish it from other predefined regions, 

of related diagnostic agents or axnpounds useful in Typically, purity will be measured in terms of biologiad 

therapeutic treatments such as for auto-immune dis- activity or function as a result of uniform sequence. Such 

eases(e.g., by blocking the binding of the "self" characteristics will typically be measured by way of binding 

antibodies) with a sdected ligand or receptor. 

d) Nucleic Adds- 65 Activator refers to an energy source adapted to render a 
Sequences of uuddc adds may be synthesized to group active and which is directed from a ^ source to a 

establish DNA or RNA binding sequences. predefined location on a substrate. A primary illustration of 
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an activator is light Other examples of activators include ion 
t^imt electric fields, magnetic fields, electron beams, x-ray, 
and the IDoe. 

13. Binary Synthesis Strategy refers to an ordered strategy 
for parallel synthesis of diverse polymer sequences by 
sequential addition of reagents which may be represented by 
a rcactant matrix, and a switch matrix, the produa of which 
is a produa matrix. A reactant matrix is a lxn matrix of the 
building blocks to be added. The elements of the switch 
matrix are binary numbers. In ■preferred embodiments, a 
binary strategy is one in which at least two successive steps 
iUmninate half of a region of interest on the substrate. In 
moat preferred embodiments, binary synthesis refers to a 
synthesis strategy which also factors a previous addition 
step. For example, a strategy in which a switch matrix for a 
masking strategy halves regions that were previously 
murninated, illuminating about half of the previously illu- 
minated region and protecting the remaining barf (while also 
protecting aboot half of previously protected regions and 
ffl.. T m..«rin g about half of previously protected regions). It 
will be recognized that binary rounds may be interspersed 
with non-binary rounds and that only a portion of a substrate 
may be subjected to a binary scheme, but will still be 
considered to be a binary masking scheme within the 
definition herein. A binary "masking* strategy is a binary 
synthesis which uses light to remove protective groups from 
m^w^it for addition of other materials such as amino acids. 
In preferred *™>w*WnM selected columns of the switch 
matrix are arranged in order of increasing binary numbers in 
the columns of the switch matrix. 

14. finfaw refers to a molecule or group of molecules 
attached to a substrate and spacing a synthesized polymer 
from the substrate for exposure/binding to a receptor. 

IL General 

The present invention provides synthetic strategies and 
devices for the creation of large scale chemical diversity. 
Solid-phase chemistry, photoUbile protecting groups, and 
photolithography are brought together to achieve light- 
directed spatially-addressable parallel chemical synthesis in 
pi ef cued embodiments. 

The invention is described herein for purposes of illus- 
tration primarily with regard to the preparation of peptides 
and nucleotides, but could readily be applied in the prepa- 
ration of other polymers. Such polymers include, for 
example, both and cyclic polymers of nucleic acids, 
rx>ty saccharides, phospholipids, and peptides having either 
a-, or a>amino acids, heteroporymers in which a known 
drug is covalentry bound to any of the above, poryurethanes, 
polyesters, polycarbonates, polyurcas, polyamides. 
poly ethy lc neirnines . polyarylene sulfides, polysiloxanes. 
polyimides, polyacetates, or other poiymcn which will be 
apparent upon review of this disclosure. It will be recog- 
nized further, that illustrations herein are primarily with 
reference to C- to N-tenmnal synthesis, but the invention 
could readily be applied to N- to C4erminal synthesis 
without departing from the scope of the invention 
A. ^protection and Addition 

The present invention uses a masked light source or other 
activator to direct the simultaneous synthesis of many dif- 
ferent chemical compounds. FIG. 1 is a flow chart illustrat- 
ing m e process of forming chemical compounds according 
to one embodiment of (he invention. Synthesis occurs on a 
solid support Z A pattern of ffluminarion through a mask 4a 
using a 

light source € a****™**** which regions of the 
support are activated for chemical coupling. In one preferred 
emrxxfiment activation is accomplished by using light to 
remove photolabile protecting groups from selected areas of 
the substrate. 
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After deprotection. a first of a set of building blocks 
(indicated by "A" in FIG. 1), each bearing a photolabile 
' protecting group (indicated by **X TT ) is exposed to the surface 
of the substrate and it reacts with regions that were 
5 addressed by light in the preceding step. The substrate is 
then illuminated through a second mask 4b, which activates 
another region for reaction with a second protected building 
block "B". The pattern of masks used in these Humiliations 
and the sequence of reactants define the ultimate products 
ic and their locations, resulting in diverse sequences at pre- 
defined locations, as shown with the sequences ACEG and 
BDFH in the lower portion of FIG. 1. Preferred embodi- 
ments of the invention take advantage of combinatorial 
masking strategies to form a large number of compounds in 
15 a <mall number of chemical steps. 

A high degree of miniaturization is possible because the 
density of compounds is detennined largely with regard to 
spatial addressability of the activator, in one case the dif- 
fraction of light Each compound is physically accessible 
20 and iis position is precisely known. Hence, the array is 
spatially- addres sabi e and its interactions with other mol- 
ecules can be assessed. 

In a particular embodiment shown in FIG. L, the substrate 
contains amino groups that are blocked with a photolabile 
23 protecting group. Amino acid sequences are made access i ble 
for coupling to a receptor by removal of the pfaotc$rotectrve 
groups. 

When a polymer sequence to be synthesized is, for 
example, a polypeptide, amino groups at the ends of linkers 
X attached to a glass substrate are derivatized with nitrovera- 
tryloxycarbonyl (NVOC), a photcremovable protecting 
group. The Kwinw molecules may be. for example, aryl 
. acetylene, ethylene glycol oligomers containing from 2-10 
monomers, diamines, di acids, amino adds, or combinations 
15 thereof. Hwtodcprotection is effected by illumination of the 
substrate through, for example, a mask wherein the pattern 
has transparent regions with dimensions of. for example, 
less than 1 cm 2 , 1(T X cm 2 , 1CT 2 cm 2 . 1(T 3 cm 2 , 1CT* cm 2 , 
1<T 3 an 2 , KT 6 an 2 , 1(T 7 cm 2 , 1CT* cm 2 , or 1(T 10 an 2 . In 
4C a preferred embodiment, the regions are between about 
10x10 urn and 500x500 um. According to some 
embediments, the masks are arranged to produce a check- 
erboard array of polymers, although any one of a variety of 
geometric configurations may be nffliTcri 
45 1. Example 

In one example of the invention, free amino groups were 
fluoresce ntly labelled by treatment of the entire substrate 
surface with fluorescein isomiocynate (FTTQ after photo- 
deprotection. Glass mi croacope slides were cleaned, ami- 
30 nated by treatment with 0. 1 % ammopropyltriemoxysilaiie in 
95% ethanol. and incubated at 110° C for 20 min. The 
amiaated surface of the slide was then exposed to a 30 mM 
solution of the N-hydroxy racanimide ester of NVQC- 
GABA (nitroverarrylcDrycarbonyl-x-amino butyric acid) in 
a DMF. The NVOC protecting group was photolytically 
removed by ""aging the 365 nm output from a Hg arc lamp 
through a chrome on glass 100 um checkerboard mask onto 
the substrate for 20 min at a power density of 12 roW/cm 2 . 
The exposed surface was then treated with 1 mM F1TC in 
60 DMF. The substrate surface was scanned in an epi- 
fluorescence microscope (Zeiss Axioskop 20) using 488 nm 
excitation from an argon ion laser (Spectra-Physics model 
2025). The fluorescence emission above 520 nm was 
detected by a cooled pboComultiplier (Hamamatsu 943-02) 
65 operated in a photon counting mode. Fluorescence intensity 
was translated into a color display with red in the highest 
intensity and black in the lowest intensity areas. Toe pres- 
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ence of a high-contrast fluorescent checkerboard pattern of mean intensity of sixteen YGGFL synthesis sites was 2.03x 

100x100 urn elements revealed that free amino groups were 10 3 counts and the standard deviation was 9.6x1 0 3 counts, 

generated in specific regions by spatiaUylocalized photo- tjl Synthesis 

deprotection. a. Reactor System 

Z EXAMPLE 3 FIG. 4 schematically illustrates a device used to synthc- 

FIG. 2 is a flow chart iUustrating another example of the diverse polymer sequences on a substrate. The 

invention. Carboxy-activated NVOC-leucine was allowed to substrate, the area of synthesis, and the area for synthesis of 

react with -an aminated substrate. The .carboxy activated each individual polymer could be of any size or shape. For 

HOBT ester of leucine and other amino acids used in this example, squares, ellipsoids, rectangles, triangles, circles, or 

synthesis was formed by mixing 0.25 mmol of the NVOC 10 portions thereof, along with irregular geometric shapes may 

amino protected amino acid with 37 mg HOBT be utilized. Duplicate synthesis areas may also be applied to 

(l^ydroxybenzotriazolc), 111 mg BOP (benzocriazolyl-n- a single substrate far purposes of redu n da n cy, 

oxy-tris (dimethylamino)-phosphoniumhexa- In one embodiment, the predefined regions on the sub- 

fluorophosphate) and 86 ul DIEA(diiscpropylemylainine) in strate win have a surface area of between about 1 cm 2 and 

2.5 ml DMF. The NVOC protecting group was removed by 15 l(T 10 cm 2 . In some embodiments the regions have areas of 

uniform illumination. Carboxy-activated NVOC- less than about KT 1 cnr\ 10~ 2 cm 2 . 10" 3 cm 2 . 1CT* cmMO -3 

phenylalanine was coupled to the exposed amino groups for cm 2 . 10" -6 an 2 . l<T 7 cm a , 10~* cm , 10 cm* or 10" cm . 

2 hours at room temperature, and then washed with DMF In a preferred embodiment, the regions are between about 

and methylene chloride. Two unmasked cycles of photo- 10x10 um. 

deprotection and coupHng with carboxy-activated NVOC- 20 in some embodiments a single substrate supports mere 

glycine were carried oul The surface was then ffluminated than about 10 different monomer sequences and perferably 

through a chrome on glass 50 ul checkerboard pattern mask, more than about 100 different monomer sequences, although 

Carboxy-activated Na-tBOC-O-tButyl-L-tyrosine was then in some embodiments more than about 10 3 , 10* , 10 3 , 10 6 , 

a d4?H The entire surface was . uniformly, fllinmnated to 10 7 , or 10 1 different sequences are provided on a substrate, 

photolyze the remaining NVOC groups. Finally, carboxy- 25 Of course, within a region of the substrate in which a 

activated NVOC-L-prdine was added, the NVOC group monomer sequence is synthesized, it is preferred that the 

was removed by illumination, and the t-BOC and t-butyl monomer sequence be substantially pure. In some 

protecting groups were removed with TFA. After removal of embodiments, regions of the substrate contain polymer 

the protecting groups, the surface consisted of a 50 um sequences which are at least about 1%, 5%. 10%, 15%. 20*. 

checkerboard array of Tyr-Gly-Gly-Phe-Leu (YGGFL) 30 25%, 30%, 35%, 40%, 45%. 50%, 60%, 70%, 80%, 90%, 

(Seq. ID No: 1) and Pro-CHy-Gly-Phe-Leu (PGGFLXSeq. ID 95%, 96%, 97%, 98%. or 99% pure. The device includes an 

No:2). automated peptide synthesizer 401. The automated peptide 

B. Antibody Recognition synthesizer is a device which flows selected reagents 

In one preferred embodiment the substrate is used to through a flow cell 4t2 under the direction of a computer 

determine which of a plurality of amino acid sequences is 35 404. In a preferred embodiment the synthesizer is an ABI 

recognized by an antibody of interest Peptide Synthesizer, model no. 43lA.The computer may be 

1. EXAMPLE selected from a wide variety of computers or discrete logic 

In one example, the array of pentapepcides in the example including for, example, an IBM PC-XT or similar computer 

illustrated in FIG. 2 was probed with a mouse monoclonal linked with appropriate internal control systems in the 

antibody directed against p^ndorpbin.Tbis antibody (called *o peptide synthesizer. The PC is provided with signals from 

3E7) is known to bind YGGFL and YGGFM (Seq. ID the board computer indicative of. for example, the end of a 

No:21) with nanomolar affinity and is discussed in Meo et coupling cycle. 

aL, Proc NatL Acad. ScL USA (1983) 80:4084, which is Substrate 4*6 is mounted on the flow cell, forming a 

incorporated by reference herein far all purposes. This cavity between the substrate and the flow cell Selected 

antibody requires the amino terminal tyrosine far high 45 reagents flow through this cavity from the peptide synthe- 

affinity binding. The array of peptides formed as described sizer at selected times, forming an array of peptides on the 

in FIG. 2 was incubated with a 2 ug/ml mouse monoclonal face of the substrate in the cavity. Mounted above the 

antibody (3E7) known to recognize YGOTL 3E7 does not substrate, and preferably in contact with the substrate is a 

bind PGGFL. A second incubation with fluoresceinattd goat mask 408. Mask 4*8 is transparent in selected regions to a 

anti-mouse antibody labeled the regions that bound 3E7.Tfce 30 selected wavelength of light and is opaque in other regions 

surface was scanned with an epi-fluorescence microscope, to the selected wavelength of light. The mask is illuminated 

The results showed alternating bright and dark 50 um with a light source 41% such as a UV light source. In one 

squares indicating that YGGFL and PGGFL were synthe- specific embodiment the light source 410 is a model no, 

sized in geometric array determined by the mask. A high 82420 made by OrieL The mask is held and translated by an 

contrast (>12:1 intensity ratio) fluorescence checkerboard 33 x-y-z translation stage 412 such as an x-y translation stage 

image shows that (a) YGGFL and PGGFL were synthesized made by Newport Corp. The computer coordinates action of 

in alternate 50 um squares, (b) YGGFL attached to the the peptide synthesizer, x-y translation stage, and light 

surface is accessible far binding to antibody 3E7. and (c) source. Of course, the invention may be used in some 

antibody 3E7 does not bind to PGGFL embodiments with translation of the substrate instead of the 

A three-dimensional representation of the fluorescence 60 mask, 

intensity data in a portion of the checkboaxd is shown in FIG. In operation, the substrate is mounted on the reactor 

3. This figure shows that the border between synthesis sites cavity. The slide, with its surface protected by a suitable 

is sharp. The height of each spike in this display is linearly photo removable protective group, is exposed to light at 

proportional to the integrated fluorescence intensity in a 2.5 selected locations by positioning the mask and illuminating 
um pixel. The transition between PGGFL and YGGFL 63 the light source far a desired period of time (such as, for 
occurs within two spikes (5 um). There is little variation in example, 1 sec to 60 min in the case of peptide synthesis), 
the fluorescence intensity of different YGGFL squares. The A selected peptide or other monomer/polymer is pumped 
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through the reactor cavity by the peptide synthesizer for in a synthesis region. A substrate formed with mixtures of 

hinging it the selected locations on the substrate. After a compounds in various synthesis regions may be used to 

sr\*rt~* reaction time (such as about 1 sec to 300 min in the perform, for example, an initial screening of a large number 

case of peptide reactions) of the monomer is washed from of compounds, after which a smaller number of compounds 

the system, the mask is appropriately repositioned or 5 m regions which exhibit high binding affinity are further 

replaced, and the cycle is repeated. In most emb<>diments of Similar results may be obtained by only partially 

the invention, reactions may be conducted ax or near ambient ^otykzmg a region, adding a first monomer, re-piwcylizmg 

. ■ .the same region, and exposing the region to a second 

temperature. monomer 

FIGS. So and Sb are flow charts of the software used in fi * ^ ^ 

operation of the reactor system. At step 502 the pepbde ic ^ a j^ difeacd mc ^oducts 

synthesis software is initialized. At step 504 the system forincd 0Q ^ paoern and order of masks, and on the 

calibrates positioners on the x-y translation stage and begins rf rcactants# To wak£ a sct of products there will in 

a main loop. At step 506 the system determines which, if general be V possible masking schemes. In preferred 

any. of the function keys on the computer have been pressed. embodiments of the invention herein a binary synthesis 

If Fl has been pressed, the system prompts the user for input 15 stnxcgyAs utilized. The binary synthesis strategy is Elus- 

of a desired synthesis process. If the user enters F2. the xmcd herein primarily with regard to a masking strategy, 

system allows a user to edit a file for a synthesis process at although it will be applicable to other polymer synthesis 

step 510. If the user enters F3 the system loads a process strategies such as the pin strategy, and the like, 

from a disk at step 512. If the user enters F4 the system saves jj, a binary synthesis strategy, the substrate is irradiated 

an entered or edited process to disk at step 514. If the user 20 with a first m*«lr exposed to a first building block, irradiated 

selects F5 the current process is displayed at step 51* while ^ a exposed to a second building block, etc 

selection of F6 starts the main portion of the program. Le.. p«^h combination of masked irradiation and exposure to a 

the actual synthesis according to the selected process. If the building block is referred to herein as a "cycle," 

user selects F7 the system displays the location of the In a preferred binary masking scheme, the for each 

synthesized peptides, while pressing Fit returns the user to 25 cycle allow irradiation of half of a region of interest on the 

the disk operating system. substrate and protection of the remaining half of the region 

FIG. 5* illustrates the synthesis step 518 in greater detail. of jnTT~t By TulT it is intended herein not to mean 

The main loop of the program is started in which the system exactly one-half the region of interest but inctMri * Urge 

first moves the mask to a next position at step 526. During fraction of the region of interest such as from about 30 to 70 

the main loop of the program, necessary chemicals flow 30 percent of the region of interest. It will be understood that 

through the reaction cell under the direction of the on -board ^ entire mnWng scheme need not take a binary form; 

computer in the peptide synthesizer. At step 528 the system instead non-binary cycles may be introduced as desired 

then waits for an exposure command and, upon receipt of the between binary cycles. 

exposure command exposes the substrate far a desired time ^ preferred embodiments of the binary m*«iring scheme, 

at step 530. When an acknowledge of exposure complete is 35 a £ vcn cy^e n hvmxnnt^. only about half of the region which 

received at step 532 the system determines if the process is was iUuminated in a previous cycle,' while protecting the 

complete at step 534 and. if so, waits for additional keyboard r <» m»inii»g half th* niiirnimM pnrtirm frnm th* pn-vir»i< 

input at step 536 and. thereafter, exits the perform synthesis cyde. Conversely, in such preferred embodiments, a given 

process. cyde illuminates half of the region which was protected in 

A computer program used for operation of the system *o ^ previous cyde and protects half the region which was 

described above is induded as microfiche Appendix A protected in a previous cycle. 

(Copyright. 1990, Affymax Technologies N.V., all rights The synthesis strategy is most readily illustrated and 

reserved). The program is written in Turbo C++ (Borland handled in matrix notation. At each synthesis site, the 

Int'l) and has been impl f mmtr d in an IBM compatible determination of whether to add a given monomer is a binary 

system. The motor control software is adapted from software 45 process. Therefore, r*rh product element P, is given by the 

produced by Newport Corporation. It will be recognized that ^ product 0 f two vectors, a chemical reacttnt vector, eg., 

a large variety of programming languages could be utilized 0(A3.CD]. and a binary vector o> Inspection of the 

without departing from the scope of the invention herein. products in the example below for a four-step synthesis. 

Certain calls are made to a graphics program in "Program- ^ows that in one four-step synthesis o/«[l,0,l,0], O^LO, 

mer Guide to PC and PS2 Video Systems- (Wilton, so q ^ oy=io.l,l,0). and a 4 =[0,l,0.1]. where a 1 indicates 

Microsoft Fresj, 1987), which is incorporated herein by iUumination and a 0 indicates protection. Therefore, it 

reference for aQ purposes. becomes possible to build a "switch matrix" S from the 

Alignment Of the mask is achieved by One Of tWO methods rrdmnn g^rtnrt rr (j=1 Jr vhm k- it thft nnraher of proAlcts). 

in preferred embodiments. In a first embodiment the system 

relies upon relative «iig«ww«t of the various components, 35 o t ci oj a* 
which is normally acceptable since x-y-z translation stages 
are capable of suffirient accuracy for the purposes herein. In 
alternative embodiments, alignment marts on the substrate 
are coupled to a CCD device for appropriate alignment 

According to some embodiments, pure reagents are not «o 
added at each step, or complete photolysis of the protective 

groups is not provided at each step. According to these The outcome P of a synthesis is simply P-CS, the product 

embodiments, multiple products wfll be farmed in each of the chemical reactant matrix and the switch matrix, 

synthesis site. Far example, if the monomers A and B are The switch matrix for an n-cycle synthesis yielding k 

mixed during a synthesis step. A and B will bind to depro- 65 products has n rows and k columns. An important attribute 

totfl regions, roughly in proportion to their concentration of S is that each row specifies a mask. A two-dimensional 

in solution. Hence, a mixture of compounds will be formed mask m^ for the jth chemical step of a synthesis is obtained 
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directly from the jtfa row of S by placing the elements s,-,, . locations on the substrate are simply defined by the columns 
. . Sp into, far example, a square format The particular of the switch matrix (the first column indicating, for 
arrangement below provides a square format although lin- example, that the product ABCD will be present in the upper 
ear or other arrangements may be utilized. left-hand location of the substrate). Furmeimore, if only 

5 selected desired products are to be-made. the mask sequence 
*u la *a *u can be derived by extracting the columns with the desired 

/u m is in $i <n . sequences. For example, to form the product set ABCD. 

5 %i *n i* *> ABD. ACD. AD, BCD. BD. CD, and D. the masks are 

fanned by use of a switch matrix with only the 1st 3rd. 5th, 
ui *c so i44 1Q 7tlL o^h. nth, 13th, and 15th columns arranged into the 

switch matrix: 

Of course, compounds formed in a light-activated syn- 
thesis can be positioned in any denned geometric array. A l i l l o o o o 
square or rectangular matrix is convenient but not required. l l o 0 l l o o 
The rows of the switch matrix may be transformed into any w 5= i o l o l o l o 
convenient array as long as equivalent transformations are 

used for each row. 11111111 

For example, the masks in the four-step synthesis below 
are then denoted by: To form all of the polymers of length 4, the reactant matrix 

[ABCD ABCD ABCD ABCDJis used. The switch matrix will 
11001001 20 be formed from a matrix of the binary numbers from 0 to 2 i6 

mi ~o o*" 3- 1 t mja i o^'o l arranged in columns. The columns having focr monomers 

are than selected and arranged into a switch matrix, 
where 1 denotes illumination (activation) and 0 denotes no Therefore, it is seen that the binary switch matrix in general 
ill uminati on ^ will provide a representation of all the products which can 

The matrix representation is used to generate a desired set be made from an n-step synthesis, from which the desired 
of products and product maps in preferred erxmodiments. products are then extracted. 

Each compound is denned by the product of the chemical The rows of the binary switch matrix win. in ptcfai e d 
vector and a particular switch vector. Therefore* for each embodiments, have the property that each mailing step 
synthesis address, one simply saves the switch vector. iflunrinates half of the synthesis area. Each ™*<v™g step 
assembles all of them into a switch matrix, and extracts each 30 also factors the preceding masking step; that is. half of the 
of the rows to form the masks. . . region that was illuminated in the p ^^ing «**p is again 

In some cases, particular product distributions or a maxi- , illuminated, whereas the other half is not Half of the region 
mal number of products are desired. For example, for. .that was ii wiiinmin*>*H in the 'preceding step is also 
C=f A3.CJD], any switch vector (or,) consists of four bits. - illuminated, whereas the other half is not TTius, masking is 
Sixteen four-bit vectors exist Hence a maximum of 16 33 recursive. The masks are constructed, as described 
different products can be made by sequential addition of the previously, by extracting the elements of r »d i row and 
reagents [AJ3,CX>]. These 16 column vectors can be placing them in a square array For example, the four ma<Vt 
assembled in 16! different ways to form a switch matrix. The in S for a four-step synthesis are: 
order of the column vectors defines the ™<lring patterns, 

and. therefore, the spatial ordering of products but not their 40 l l l l l i l l 

make tip. One ordering of these columns gives the following l l l l 0 0 0 0 

switch matrix (in which "null" (6) additions are included in m| °o o o o" 53 ! l l i 

brackets for the sake of completeness, although such null 0 
additions are elsewhere ignored herein): 
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The recursive factoring of masks allows the products of a 
light-directed synthesis to be represented by a polynomial 
(Same light activated syntheses can only be denoted by 
irreducible, Le.. prime polynomials.) For example, the poly- 
nomial corresponding to the top synthesis of FIG. 9a 
(discussed below) is 

P^A+BXC +D) 

The columns of S according to this aspect of the invention 

are the binary representations of the numbers 15 to 0. The A reaction porynornial may be expanded as though it were 
sixteen products of this binary synthesis are ABCD, ABC. 60 an algebraic expression, provided that the order of joining of 
ABD. AB. ACD. AC AD. A. BCD, BC, BD, B. CD, C, D, reactants Xj and Xj is preserved (XjXj SX^Xj), tc, the 
and 6 (null). Also note that each of the switch vectors from products are not commutative. The product then is AC+AD+ 
the four-step synthesis masks above (and hence the synthesis BC+BD. The polynomial explicitly specifies the reactants 
products) are present in the four bit binary switch matrix. and implicitly specifies the mask far each step. pair of 
(See columns 6, 7, 10, and 11) 65 parentheses rirmarcatrs a round of synthesis. The chemical 

This synthesis procedure provides an easy way for map- reactants of a round (eg., A and B) react at nonoveriapping 
ping the completed products. The products in the various sites and hence cannot combine with one other. The synthe- 
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sis area is divided equally amongst the elements of a round 
(eg., A is directed to one-half of the area and B to the other 
half)- Hence, the masks for a round (e.g.. the masks m M and 
mB) are orthogonal and form an arthonormal set The 
polynomial notation also signifies thai each element in a 
round is to be joined to each element of the next round (e.g., 
A with C A with D. B with C and B with D). This is 
accomplished by having m^. overlap m A an m* equally, and 
likewise for m^. Because C and D are elements of a round, 
nv and are orthogonal to each other and form an 
ofthononnal set. 

The polynomial representation of the binary synthesis 
described above, in which 16 products are made from 4 
reactants, is 

which gives ABCD. ABC, ABD, AB, ACD, AC AD, A 
BCD, BC BD, B, CD, C D. and • when expanded (with the 
rule that 9X«X and Xfr=X. and remembering that joining is 
ordered). In a binary synthesis, each round contains one 
reactant and one null (denoted by 6). Half of the synthesis 
area receives the reactant and the other half receives nothing. 
Each mask overlaps every other mask equally. 

Binary rounds and non-binary rounds can be interspersed 
as desired, as in 

IMA+«XBXM>*«XW t *0) 

The 18 compounds formed are ABCE. ABCF. ABCG. 
ABDE. ABDF, ABDG, ABE. ABF, ABG. BCE, BCF, BCG. 
BDE, BDR BDG. BE, BE and BG. The switch matrix S for 
this 7-step synthesis is 

1 11 1 1111ICOO0 00000 

111111111111111111 

1 110000001 1 10000 0 0 
J 3 0 00111000000111000 
100100 100100 100100 
010010010010010010 
00100 1001001 001001 

The round denoted by (B) places B in all products because 
the reaction area was uniformly activated (the mask for B 
consisted entirely of Ts). 

The number of compounds k formed in a synthesis 
consisting of r rounds, in which the ith round has b, chemical 
reactants and z, nulls, is 

and the number of chemical steps n is 

The n i ? m> *» r of compounds synthesized when b=a and z=0 in 
all rounds is a"", compared with T fox a binary synthesis. 
For d~20 and a-5. 625 compounds (all tetrarneros) would be 
formed, compared with 1.049x10* compounds in a binary 
synthesis with the same number of chemical steps. 

It should also be noted that rounds in a polynomial can be 
nested, as in 

The products are AD, BCD. BD, CD, D, A, BC B, C, and 

e. 

Binary syntheses are attractive for two reasons. First they 
generate the m»Tim*i number of products (2") for a given 
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number of chemical steps (n). For four reactants, 16 com- 
pounds are formed in the binary synthesis, whereas only 4 
are rn*Ar when each round has two reactants. A 10-step 
binary synthesis yields 1.024 compounds, and a 20- step 

5 synthesis yields 1.048.576. Second, products formed in a 
binary synthesis are a complete nested set with lengths 
ranging from 0 to n. All compounds that can be formed by 
during one or more units from the longest product (the 
n-mer) are present Contained within the binary set are the 
smaller sets that would be formed from the same reactants 

10 using any other set of masks (e.g.. AC AD, BC. and BD 
formed in the synthesis shown in FIG. 6 are present in the 
set of 16 formed by the binary synthesis). In some cases, 
however, the experimentally achievable spatial resolution 
may not suffice to accommodate all the compounds formed. 

15 Th eref o re , practical limitations may require one to select a 
particular subset of the possible switch vectors for a given 
synthesis. 

1. EXAMPLE 

FIG. 6 illustrates a synthesis with binary masking scheme. 

20 The binary masking scheme provides the greatest number of 
sequences for a given number of cycles. According to this 
embodiment, a mask ml allows illumination of half of the 
substrate. The substrate is then exposed to the building block 
A, which binds at the Olinrrinafrd regioos. 

25 Thereafter, the mask m2 allows illumination of half of the 
previously iUnminated region, while protecting half of the 
previously illmninated region. The building block B is then 
added, which binds at the iUuminated regions from m2. 
The process continues with masks m3, m4, and mS, 

30 resulting in the product array shown in the bottom portion of 
the figure. The process generates 32 (2 raised to the power 
of the nmnber of monomers) sequences with 5 (the number 
of monomers) cycles. 
2- EXAMPLE 

35 FIG. 7 illustrates another preferred binary masking 
scheme which is referred to herein as the gray code masking 
scheme. According to this embodiment, the masks ml to m5 
are selected such that a side of any given synthesis region is 
denned by the edge of only one mask. The site at which the 

40 sequence BODE is formed, for example, has its right edge 
A»firwH by m5 and its left side formed by mask m4 (and no 
other mask is aligned on the sides of this site). Accordingly, 
problems created by misalignment, diffusion of light under 
the mask and the lie will be minimi wrf 

45 3. EXAMPLE 

FIG. 8 illustrates another binary masking scheme. 
According to this scheme* referred to herein as a modffied 
gray code masking scheme, the number of masks need e d is 
piinimi-wn* For example, the mask m2 could be the same 

50 mask as ml and simply translated laterally. Similarly, the 
mask m4 could be the same as mask m3 and simply 
translated laterally. 
4. EXAMPLE 

A four-step synthesis is shown in FIG. 9a. The reactants 
55 are the ordered set {A3. CD}. In the first cycle, flhrmination 
through m t activates the upper half of the synthesis area. 
Building block A is then added to give the distribution 691. 
IHuiiiination through mask m? (which activates the lower 
half), followed by addition of B yields the next mtermediate 
60 distribution 604. C is added after illumination through m 3 
(which activates the left half) giving the distribution 6#4, 
and D after iUumination through m* (which activates the 
right half), to yield the final product pattern 6#S (ACAD, 
BC3D). 
65 5. EXAMPLE 

The above masking strategy for the synthesis may be 
extended for all 400 dipeptides from the 20 naturally occur- 
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ring amino acids as shown in FIG. 9b. The synthesis consists of the controls needed to assess the fidelity of a synthesis, 
of two rounds, with 20 photolysis and chemical coupling For example, the fluorescence signal from a synthesis area 
cycles per round. In the first cycle of roand 1. mask 1 nominally containing a tetrapeptide ABCD could come from 
activates Vxrth of the substrate for coupling with the first of a tripeptide deletion impurity such as ACD. Such an artifact 
20 amino acids. Nineteen subsequent iHimiination/coupling 5 would be ruled out by the finding that the fluorescence 
cycles in round 1 yield a substrate consisting of 20 rectan- intensity of the ACD-site is less than that of the ABCD site, 
gular stripes each bearing a distinct member of the 20 amino The fifteen most highly labelled peptides in the array 
acids. The masks of round 2 axe perpendicular to round 1 obtained with the synthesis of 1.024 peptides described 
masks and therefore a single Uluminalioa/ coupling cycle in above, were YGAFLS (SEQ. ID No:5). YGAFS (SEQ. ID 
round 2 yields 20 dipeptides. The 20 muminarion/coupling 10 No:6), YGAFL (SEQ. ID No:7), YGGFLS (SEQ. ID No:8), 
cycles of round 2 complete the synthesis of the 400 dipep- YGAF (SEQ. ID No:8), YGALS (SEQ. ID No:9), YGGFS 
tides. (SEQ. ID No: 10). YGAL (SEQ. ID No: 1 1) . YGAFLF (SEQ. 

6. EXAMPLE ID No:12), YGAF (SEQ. ID No:13). YGAFF (SEQ. ID 

The power of the binary masking strategy can be appre- No: 14). YGGLS (SEQ. ID No: 15). YGGFL (SEQ. ID 
dated by the outcome of a 10-step synthesis that produced 15 No:l6"), SEQ. ID No:17), and YGAFLSF (SEQ. I fifteen 
1,024 peptides. The polynomial expression for this 10-step begin with YG. which agrees with previous work showing 
binary synthesis was: that an amino-*enmnal tyrosine is a key detenninant of 

binding. Re si doe 3 of this set is either A or G, and residue 
(MXY^XG^XA^XGtflja^) CfM»XL*X!W»XF+«) 4 „ ^ p or L. The exclusion of S and T from these 

Fjph peptide occupied a 400x400 square. A 32x32 20 positions is clear cut The finding that the preferred sequence 
peptide array (1.024 peptides, including the null peptide and is YG (A/G) (F/L) fits nicely with the outcome of a study in 
10 peptides of 1=1. and a irmfr^i number of duplicates) was which a very large library of peptides on phage generated by 
clearly evident in a fluorescence scan following side group recombinant DNA methods was screened for binding to 
deprotection and treatment with the antibody 3E7 and fluo- antibody 3E7 (see Cwiria et aL. Proc. NaxL Acad, Sex. USA. 
resonated antibody. Each synthesis sice was a 400x400 um 25 (1990) 87:6378, incorporated herein by reference). Addi- 
square. tional binary syntheses based on leads from peptides on 

The scan showed a range of fluorescence intensities, from phage experiments show that YGAFMQ (SEQ. ID No: 1 8), 
a background value of 3300 counts to 22,400 counts in the YGAFM (SEQ. ID No: 19), and YGAPQ (SEQ. ID No:20) 
brightest square (x=20, y=9). Only 13 compounds exhibited give stronger fluorescence signals than does YGGFM, the 
an intensity greater than 12300 counts. The median value of X) irnmunogen used to obtain antibody 3E7. 
the array was 4,800 counts. Variances on the above masking strategy will be valuable 

The identity of each peptide in the array could be deter- in certain circumstances. For example, if a * 4 kemel w 
mined from its x and y coordiiiales (each range from 0 to 3 1) sequence of interest consists of PQR separated from XYZ 
and the map of FIG. 10. The r+>^mir*] units at positions 2, and that the aim is to synthesize peptides in which these 
5, 6. 9. and 10 are specified by the y coordinate and those at 35 units are separated by a variable number of different 
positions 1, 3. 4. 7, 8 by the x coordinate. All but one of the residues, then the kernel can be placed in each peptide by 
peptides was shorter m«n 10 residues. For example, the using a mask that has Vs everywhere. The polynomial 
peptide at x=12 and y=3 is YGAGF (SEQ. ID No3) representation of a suitable synthesis is: 
(positions 1, 6. 8. 9, and 10 are nulls). YGAFLS (SEQ. ID ^ovRVA^YB+flyc^yi^vxYYWi 
No:4), the brightest element of the array, is at x=20 and y=9. 40 (FXQX»XA^XB4«XC^xi>^XXXYXZ) 

It is often desirable to deduce a binding affinity of a given Sixteen peptides will be formed, ranging in length from the 
peptide from the measured fluorescence intensity. 6-mer PQRXYZ to the 10-mer PQRABCDXYZ. 
Conceptually, the simplest case is one in which a single Several other ™«Wng strategies will also find value in 
peptide binds to a univalent antibody molecule. The fluo- selected circumstances. By using a particular mask more 
rescence scan is carried out after the slide is washed with 45 oncc ^ reactants will appear in the same set 
buffer for a defined time. The order of fluorescence in ten- 0 f products. For example, suppose that the mask for an 
sides is then a measure primarily of the relative dis sociati on g-step synthesis is 
rates of the antibody -peptide complexes. If the on-rate 
constants are the same (e.g., if they ire diffusion-controlled), 



F 01010101 
o 11110000 

K 00001111 



the order of fluorescence jw*»wgtti^ will correspond to the 50 a 11110000 

order of binding affinities. However, the situation is some- £ noouoo 

times more complex because a bivalent primary antibody d 00110011 

and a bivalent secondary antibody are used. The density of e 10101010 
peptides in a synthesis area corresponded to a mean sepa- 
ration of -7 nm which would allow multivalent antibody- 55 

peptide interactions. Hence, fluorescence intensities • 

obtained according to the method herein will often be a 

qualitative indicator of binding affinity. The products are ACEG, ACFG, ADEG. ADFG, BCEH. 

Another important consideration is the fidelity of syntbe- BCFH, BDEH. and BDFH. A and G always appear together 

-sis. Deletions arc produced by incomplete r>hotodeprotection 60 because their additions were directed by the same mask, and 

or incomplete coupling. The coupling yield per cycle in likewise for B and H. 

these experiments is typically between 85% and 95%. C Linker Selection 

Implementing the switch matrix by masking is imperfect According to preferred embodiments the linker molecules 

because of tight diffraction, internal reflection, and scatter- used as an intermediary between the synthesized polymers 

ing. Consequently, stowaways (chemical units that should 65 and the substrate are sele cte d for optimum length and/or 

not be on board) arise by unintended iHumination of regions type for improved binding interaction with a receptor, 

that should be dark. A binary synthesis array contains many According to this aspect of the invention diverse linkers of 
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varying length and/or type are synthesized for subsequent carboxyl group of an amino acid, and the nature of the 

attachment of a ligand. Through variations in the length and chemical synthesis will dictate which reactive group will 

type of linker, it becomes possible to optimize the binding require a protecting group. Analogously, attachment of a 

intamction between an imnwbilized ligand and its receptee protecting group to the 5'-bydrcocyl group* a nucleoside 

The degree of binding between a ligand (peptide, 3 during synthesis using f« «a^k^os^-tncster cou- 

inhibiion^en, drug, etc) and its receptor (Enzyme. chemistry, prevents the 5-hy^oxyl ofooe nucleoside 

S etcO wnen ot of the partners ^Son tnmtaa** ^ 3 - ac * vatcd Poosphate-tnester of 

to a substrate wfll in ^.^^^^^^.^ "Regardless of the specific use, protecting groups are 

accessibthty of me receptor in solution to the mm^obilized tJ ~££d lQ ttct am ™^ on a molecule from reacting 

ligand. The accessibility in turn will depend on Ae length 10 FYotecting groups of the present inven- 

and/or type of linker molecule employed to immobilize one tion ^vc the following characteristics : they prevent selected 

of the partners. Preferred embodiments of the invention rcageo£s from modifying the group to which they are 

therefore employ the ULSIPS 7 * technology described attached; they are stable (that is, they remain attached to the 

herein to generate an array of, preferably, inactive or inert molecule) to the synthesis reaction conditions; they are 

linkers of varying length and/or type, using photochemical 15 removable under conditions that do not adversely affect the 

protecting groups to selectively expose different regions of remaining structure; and once removed, do not react appre- 

the substrate and. to build upon chemically-active groups. . ciably with the surface or surface-bound oligomer. The 

In the simplest embodiment of this concept, the same unit selection of a suitable protecting group wOl depend, of 

is attached to the substrate in varying multiples or lengths in course, on the chemical nature of the monomer unit and 

known locations on the substrate via VLSIPS™ techniques 20 oligomer, as well as the specific reagents they are to protect 

to generate an array of polymers of varying length. A single against 

ligand (peptide, drug, hapten, etc) is attached to each of Jq a preferred embodiment, the protecting groups are 

>tv-m and an assay is performed with the binding site to photoactivatable- The p r o per ti es and uses of photoreactive 

. evaluate the degree of binding with a receptor that is known protecting compounds have been reviewed. See, McCray et 

to bind to the ligand. In cases where the linker length 23 aL, Am. Rev. of Biophyy and Biopkys. Chem. (1989) 

impacts the ability of the receptor to bind to the ligand, 18:23 9-270, which is mcorporatcd herein by r e f er en ce, 

varying levels of binding will be observed. In general, the preferably, the photosensitive protecting groups will be 

ii»ir«- which provides the highest binding will then be used removable by radiation in the ultraviolet (UV) or visible 

to assay other ligands synthesized in accordance with the .. portion of the electromagnetic spectrum. More preferably, 

techniques herein. 30 the protecting groups will be removable by radiation in the 

According to other embodiments the binding between a near UV or visible portion of the spectrum. In some 

single ligand/recepcar pair is evaluated for linkers of diverse embodiments, however, activation may be performed by 

monomer sequence. According to these embodiments, the - other methods such as localized .heating, electron beam 

linkers are synthesized in an array in accordance with the lithography, laser pumping, oxidation or reduction with 

techniques herein . and have different monomer sequence 35 microciectrodes, and the like. Sulfonyl compounds are sirit- 

(andV optionally, different lengths). Thereafter, all of the able reactive groups for electroa beam lithography. Oxidi- 

liwVir molecules are provided with a ligand known to have trvc or reductive removal is accomplished by exposure of the 

at least some binding affinity for a given receptor. The given protecting group to an electric current source, preferably 

receptor is then exposed to the ligand and binding affinity is using nucrodectrodes directed to the predefined regions of 

deduced, linker molecules which provide adequate binding 40 the surface which are desired for activation. Other methods 

between the ligand and receptor are then »rfli7rri in screen- may be used in light of this disclosure, 

ing studies. Many, although not alL of the rjbotoremovable protecting 

D. Protecting Groups groups w£U be aromatic compounds that absorb near-UV and 

As discussed above, selectively removable protecting visible radiation. Suitable photoremovable protecting 

groups allow creation of well defined areas of substrate 45 groups are described in, far example, McCray et aL, 

surface having differing reactivities. Preferably, the protect- Patchonrik, J. Amen Chem. Sec. (1970) 92 £333, and AmU 

ing groups are selectively removed from the surface by et ai, 7. Org. Chem. (1974) 39:192, which are mcorporated 

applying a sporiflr activator, such as electromagnetic radu- herein by reference. 

lion of a specific wavelength and intensity. More preferably, a preferred class of photoremovable protecting groups 

the specific activator exposes selected areas of surface to 50 the general formula: 
remove the protrrring groups in the exposed areas. 

Protecting groups of the present invention are used in 
conjunction with solid phase oligomer syntheses, such as 
peptide syntheses using natural or unnatural amino a cids , 

nucleotide syntheses using deoxyribonucleic and ribo- 55 
nucleic acids, otigcoaccharide syntheses, and the like. In 
addition to protecting the substrate surface from unwanted 
reaction, the protecting groups block a reactive end of the 
monomer to prevent self-polymerization. For instance, 

attachment of a protecting group to the amino terminus of an 60 where R\ R\ R\ and R 4 independently are a hyorogen 

activated amino acid, such as an N-hydroxysuccinimide- atom, a lower alkyl, aryl. benzyl, halogen, hydroxyl. 

activated ester of the amino nn**, prevents the amino termi- alkoxyl. thiol, thioether. amino, nitro, carboxyL formate, 

nus of one monomer from reacting with the activated ester fonnamido or phospbido group, or adjacent substxtuents 

portion of another during peptide synthesis. Alternatively, (Le., R l -R 3 , R 3 -R\ R'-R 4 ) are substituted oxygen groups 

the protecting group may be attached to the carboxyl group 65 that together form an cyclic acetal cc ketal; R 5 is a hydrogen 

of an amino acid to prevent reaction at this site. Most atom, a alkoxyl. alkyl hydrogen, halo, aryl, or alkenyl 

protecting groups can be attached to earner the amino or the group, and n=0 or 1. 
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A preferred protecting group, 6-mtrovcratryl (NV). which 
is used for protecting the carboxyl terminus of an amino acid 
or the hydroxyl group of a nucleotide for example, is 
formed when R 2 and R 3 are each a methoxy group, R 1 , R 4 
aod R s are each a hydrogen atom, and n=0: 



NO, 




OMe 



OMe 



A preferred protecting group, &mtroveratryloxyc2rbbnyl 
(NVOQ, which is used to protect the amino terminus of an 
amino add, for example, is formed when R 2 and R 3 are each 
a methoxy group, R l , R 4 and R 5 are each a hydrogen atom, 
and n=l: 



NO3 



a 




OMe 



OMe 



Another preferred protecting group, 6-nitropiperonyi 
(NP), which is used for protecting the carboxyl terminus of 
an amino acid or the hydroxyl group of a nucleotide, for 
example, is formed when R 2 and R 3 together form a meth- 
ylene acetal. R 1 , R 4 and R 3 are each a hydrogen atom, and 
n=0: 



30 



NOi 




Another preferred protecting group. 
6-mtrop ip eroDyioxycarbonyl (NPOC). which is used to pro- 
tect the amino terminus of an amino acid, for example, is 
formed when R 3 and R 3 together form a methylene acetal. 
R\ R 4 and R 3 are each a hydrogen atom, and n=l: 



NCh 




60 



A most preferred protecting group. raemyl-o^nitroveratryl 
(MeNV), which is used for protecting the carboxyl terminus 
of an amino acid or the hydroxyl group of a nucleotide, for 
example, is formed when R 3 and R 3 are each a methoxy 
group, R l and R 4 are each a hydrogen atom, R 3 is a methyl 
group, and n*0: 



22 



Me NO3 




OMe 



OMe 
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Another most preferred protecting group, methyl-6- 
nitroveratryloxycarhonyl (MeNVOC), which is used to pro- 
tect the amino terminus of an amino acid, for example, is 
farmed when R 3 and R 3 are each a methoxy group. R 1 and 
R 4 are each a hydrogen atom. R 3 is a methyl group, and d=1: 



20 




OMe 
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Another most preferred protecting group, methyl-6- 
nitropiperonyl (MeNF), which is used for protecting the 
carboxyl terminus of an amino acid or the hydroxyl group of 
a nucleotide, far example, is formed when R 3 and R* 
together farm a methylene acetai R 1 and R 4 are each a 
hydrogen atom. R 3 is a methyl group, and n=0: 



35 




40 



Another most preferred protecting group, methyl-6- 
nitrop ip er o nyloxycarfaonyl (McNPOQ, which is used to 
45 protect the amino terminus of an amino acid, for example, is 
formed when R 3 and R 3 together farm a methylene acetal, 
R 1 and R 4 are each a hydrogen atom, R* is a methyl group, 
and n=l: 



50 



55 



NO* 




A protected amino acid having a photoactivatable oxy- 
carbooyl protecting group, such NVOC or NPOC or their 
corresponding methyl derivatives, MeNVOC or MeNPOC. 
respectively, on the amino terminus is formed by acyUdng 
the amine of the amino acid with an activated cocycarbosyl 
ester of the protecting group. Examples of activated oxy- 
carbonyl esters of NVOC and MeNVOC have the general 
formula: 



23 



5,744305 



24 



0 HOi 




OM0 
MsNVOCX 



where X is halogen, mixed anhydride, phenoxy, 
p-mtrophenoxy, N-hydraxysurrinimidr. and the like. 

A protected amino idd or nodcotide having a photoac- 
trvatable protecting group, such as NV or NP or their 
corresponding methyl derivatives. McNV or MeNF, 
respectively, oo the carboxy terminus of the amino acid or 
^-hydroxy terminus of the nucleotide, is formed by acylat- 
ing the carbox y terminus or ?-OH with as activated benzyl 
derivative of the protecting group. Examples of activated 
benzyl derivatives of McNV and MeNP have the general 
formula: 



Me NO» Me NOz 




where X is halogen, hydroxyU tosyL mesyt triflnonncthyl 
diazo, azido, and the Hxr. 

Another method for generating protected monomers is to 
react the benzybc alcohol derivstive of the protecting group 
with an activated ester of the monomer. For example, to 
protect the carboxyl terminus of an ammo acid, as activated 
ester of the amino acid is reacted with toe alcohol derivative 
of the protecting group, such as o^nirroveratrol (NVOH). 
Examples of activated esters suitable for such uses include 
halo-formate, mixed anhydride, imidazoyi formater acyl 
halide, and also induces formation of the activated ester in 
situ the use of common reagents such as DCC and the like. 
See Athertoa et aL for other examples of activated esters. 

A further method for generating protected monomers is to 
.react the benzylic alcohol derivative of the protecting group 
with an activated carbon of the monomer. Far example, to 
protect the 5'-hydroxyl group of a nudeic acid, a derivative 
having a 5'-activated carbon is reacted with the alcohol 
derivative of the protecting group, such as methyl-6- 
nitropiperonol (McPyROH). Examples of nucleotides hav- 
ing activating groups attached to the S'-hydrcxyl group have 
the general formula: 



5 




where Y is a halogen atom, a tosyL mesyt. trifluoromethyL 
io azido. or diazo group, and the like. 

Another class of preferred photochemical protecting 
groups has the formula: 



15 



20 




23 where R\ R 3 , and R 3 independently are a hydrogen atorou a 
lower alkyl aryl, benzyl, halogen, hydroxy!, alkoxyl, thiol, 
thioether, amino, nitro, carboxyl, formate, formaimdo, 
sulfanates, sulfido or phosphide group, R 4 and R 3 indepen- 
dently are a hydrogen atom, an alkoxy. alkyl halo, aryl, 

30 hydrogen, or alkenyl group, and n=0 or 1. 

A preferred protecting group, 
1-pyrcnylmemyloxycarbonyl (PyROC), which is used to 
, protect the amino terminus of an amino add. for example, is 

:■ ■ formed when R l through R 3 are each a hydrogen atom and 

35 n=l: 



40 



45 




O 



Another preferred protecting group, 1-pyrenyhnctbyl 
(PyR), which is used for protecting the carboxy terminus of 
30 an amino add or me hydroxyl group of a nudeotide, for 
example, is formed when R l through R 3 are each a hydrogen 
atom and n=0: 




An amino add having a pyrenylmemyloxycarbonyl pro- 
65 tecting group on its amino terminus is formed by acylation 
of the free amine of amino add with an activated oxycar- 
bonyl ester of the pyrenyl protecting group. Examples of 
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activated oxycarbonyl esters of PyROC have the general 
formula: 




10 



where X is halogen, or mixed anhydride, p-nitrophenoxy. or 15 
N-hydroxysucrinimide group, and the like. 

A protected amino acid or nucleotide having a photoac- 
tivatable protecting group, seen as PyR, on the carboxy 
twminus of the amino acid ex ^-hydroxy terminus of the 
nucleic respectively, is formed by acylating the car- 20 
boxy terminus or 5'-OH with an activated pyrenylmethyl 
derivative of the protecting group. Examples of activated 
pyrenylmethyl derivatives of PyR have the general formula: 




where X is a halogen atom, a hydroxy 1, diazo, or azido 
group, and the like. 

Another method of generating protected monomers is to 
react the pyrenylmethyl alcohol moiety of the protecting 
group with an activated ester of the monomer. For example, 
an activated ester of an amino acid can be reacted with the 
alcohol derivative of the protecting group, such as pyrenyl- 
methyl alcohol (PyROH), to form the protected derivative of 
the carboxy terminus of the amino acid. Examples of acti- 
vated esters include halo-formate, im* 1 *** anhydride, imida- 
zoyl formate, acyl halide, and also includes formation of the 
activated ester in situ and the use of common reagents such 
as DCC and the like. 

Clearly, many photosensitive protecting groups are suit- 
able for use in the present invention. 

In preferred embodiments, the substrate is irradiated to 
remove the photoremovable protecting groups and create 
regions having free reactive moieties and side products 
resulting from the protecting group. The removal rate of the 
protecting groups depends on the wavelength and intensity 
of the incident radiation, as well as the physical and chemi- 
cal properties of the protecting group itself. Preferred pro- 
tecting groups are removed at a faster rate and with a lower 
intensity of radiation. For example, at a given set of 
conditions, MeNVOC and MeNPOC are pbotolytically 
removed from the N-terminus of a peptide chain faster than 
their unsubstituted parent compounds, NVOC and NPOG 
respectively. 

Removal of the protecting group is accomplished by 
irradiation to liberate the reactive group and degradation 
products derived from the protecting group. Not wishing to 
be bound by theory, it is believed that irradiation of an 
NVOC- and MeNVOC-protected oligomers occurs by the 
following reaction schemes: 



35 



r^OC-AA^3,4-dimethoxy-6-nitrosobenzaldehyde+ 
CCK+AA 

MeNVOC-AA^3.4-dmiethoxy^nitrosoacetopbenone+ 
C0 2 +AA 

where AA represents the N-terminus of the amino acid 
oligomer. . 

Along with the unprotected amino acid, other products are 
liberated into solution: carbon dioxide and a 23-dimethoxy- 
6-nitrosophenylcarbonyl compound, which can react with 
nudeophilic portions of the oligomer to form unwanted 
secondary reactions. In the case of an NVOC-protected 
amino acid, the degradation product is a 
nitrosobenzaldehyde. while the degradation product far the 
other is a nitrosophenyl ketone. For instance, it is believed 
that the product aldehyde from NVOC degradation reacts 
with free amines to form a Schiff base (imine) that affects the 
remaining polymer synthesis. Preferred photoremovable 
protecting groups react slowly or reversibly with the oligo- 
mer on the support. 

Again not wishing to be bound by theory, it is believed 
that the product ketone from irradiation of a MeNVOC- 
protected oligomer reacts at a slower rate with Ducleoptules 
on the oligomer than the product aldehyde from irradiation 
of the same NVOC-protected oligomer. Although not unam- 
biguously determined, it is believed that this difference in 
reaction rate is due to the difference in general reactivity 
between aldehyde and ketones towards nucleopules due to. 
stcric and electronic effects. 

The photoremovable protecting groups of the present 
invention are readily removed. For example, the photolysis 
of N-protected L-pbenylalanine in solution and having dif- 
ferent photoremovable protecting groups was analyzed, and 
the results are presented in the following table: 

TABLE 

Fbototra of Protected Vftx — OH 

LaBJSSB^ 



Sotre&f 



HBOC NVOC MeNVOC MeNPOC 



Dioouoe 

5 mM n^SOJOiaimc 



12S8 
1573 



110 
98 



34 

33 



19 
22 



The half life, tl/2, is the time in seconds required to 

45 remove 50% of the starting amount of protecting group. 
NBOC is the 6-mtrobenzyloxycarbonyl group, NVOC is the 
6-mtroveratrylaxycaTbonyl group, MeNVOC is the methyl- 
o^nitroveratryloxycarbonyl group, and MeNPOC is the 
inethyl-6-nitropiperonylox^ group. The photolysis 

so was carried out in the indicated solvent with 362/364 
nm-wavelength irradiation having an intensity of 10 
mW/cm 3 , and the concentration of each protected phenyla- 
lanine was 0.10 mM. 
The table shows that deprotection of NVOC-, MeNVOC-, 

55 and MeNPOC-protected phenylalanine proceeded faster 
than the deprotection of NBOC Furthermore, it shows that 
the deprotection of the two derivatives that are substituted 
on the benzyhc carbon. MeNVOC and MeNPOC were 
photolyzed at the highest rates in both dioxane and acidified 

60 dioxane. 

1. Use of Photoremovable Groups During Solid-Phase 
Synthesis of Peptides 

The formation of peptides on a solid-phase support 
requires the stepwise attachment of an amino acid to a 
65 substrate-bound growing chain. In order to prevent 
unwanted polymerization of the monomeric amino acid 
under the reaction condition 1, protection of the amino ter- 
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minus of the amino acid is required. Alter the monomer is 
coupled to the end of the peptide, the N-tenninal protecting 
groop is removed, and another amino acid is coupled to the 
chain. This cycle of coupling and deproteaing is continued 
for each amino acid in the peptide sequence. See Memfidd. 5 
7. Am. Cherru Soc. (1963) &5:2149, and Athenon et aL, 
'•Solid Phase Peptide Synthesis'* 1989. IRL Press. London, 
both incorporated herein by reference for all purposes. As 

described above, the use of a pbotoremovable protecting where B is the base attached to the sugar ring; R is a 
group allows removal of selected portions of the substrate 10 hydrogen atom when the sugar is deoxyribose or R is a 
surface, via patterned irradiation, during the deprotection aydroxyl group when the sugar is ribose; P represents an 
cycle of the solid phase synthesis. This selectively allows activatc<1 P^P*<™» p**; "d X is a photoremovable 

arr its 2 ?* —■ — - " inss^scsKasraras 

coupled only to the irradiated areas. 13 described above. The activated phosphorous group, P. is 

In one embodiment, the photoremovable protecting . preferably a reactive derivative having a high coupling 
groups of the present invention are attached to an activated efficiency, such as a phosphate-triester, phosphoramidite or 
ester of an amino acid at the amino terminui: the hlcc Other activated phosphorous derivatives, as well as 

reaction conditions, are well known (See Gait). 
Y KH— x 20 £• Amino Add N-Carboxy Anhydrides Protected With a 

Photoremovable Group 

r IXning Memfield peptide synthesis, an activated ester of 

one amino add is coupled with the free amino terminus of 
a substrate -bound oligomer, Activated esters of amino acids 
where R is the side chain of a natural or unnatural amino suitable for the solid phase synthesis indude hale-formate, 
add, X is a photoremovable protecting group, and Y is an 25 mixed anhydride, imidazoyl formate, acyl halide, and also 
activated carboxylic add derivative. The photoremovable includes formation of the activated ester in situ and the use 
protecting group, X, is preferably NVOC, NPOC PyROC of common reagents such as DCC and the lfloe (SeeAmerton 
MeNVOC MeNPOC and the like as discussed above. The et aL). A preferred protected anact activated amino add has 
activated ester. Y. is preferably a reactive derivative having ^ the general formula: 
a high coupling efficiency, such as an acyl halide. mixed 
anhydride, N-hydroxysucdnimide ester, perfluorophenyl 
ester, or urethane protected add. and the like. Other acti- 
vated esters and reaction conditions are well known (See 
Amerton et al.). 33 xo, n , 

2. Use of Photoremovable Groups During Solid-Phase |J >> £ 

Synthesis of Oligouudeotides 0 o 

The formation of oligonudeorides on a solid-phase sup- 
port requires the stepwise attachment of a nudeotide to a where R is the side chain of the amino acid and X is a 
substrate-bound growing oligomer. In order to prevent 40 photoremovable protecting group. This compound is a 
unwanted polymerization of the monomelic nucleotide urethane-protected amino acid having a photoremovable 
under the reaction conditions, protection of the 5'-bydroxyl protecting group attach to the amine. A more preferred 
group of the nucleotide is required. After the monomer is activitcd * miD0 " formed when the pfaotarcmovable 
coupled to the end of the oligomer, the ^-hydroxy! protect- ^ protecting group has the general formula: 
ing group is removed, and another nucleotide is coupled to 
the chain This cyde of coupling and deprotecting is con- 
tinued for each nucleotide in the oligomer sequence. See 
Gait, Oligonucleotide Synthesis: A Practical Approach** 
19S4, IRL Press, London, incorporated herein by reference 50 
far ill purposes. As described above, the use of a pnotore- 
movable protecting group allows removal, via patterned 
irradiation, of selected portioas of the substrate surface 

during the deprotection cyde of the solid phase synthesis. where R\ R 3 , R\ and R* independently are a hydrogen 
This selectively allows spatial control of the synthesis-tbe « atom, a lower alkyl, aryl. benzyl, halogen, hydroxy^ 
next nucleotide is coupled only to the irradiated areas. aJJcoxyl, thiol, tmoether, amino, nitro, carboxyl, formate, 

Oligonudeotide synthesis generally involves coupling an ^ aDM ^ 2 ^LF^^J^^ £ *4>*»t subsritueats 
activated r^sphorous derivative on the 3'-hydroxyl group -R, R -R , R 1? m substito^ oxygen pour* 

of a nudeotide with the 5'-hydraxyl group of an dimmer „ ^ t0gCth f fonn V**" » 4 

bound to a solid support Two maj JcrSl in^r^d* « " ^ 

to perform thisco^ling: the pt^te-triester and phos- A ^red activated amino acid is formed when the 
r^cfamidite mctbods photoremovable protecting group is 

present invention are suitable for use in either method. 6-mtroverairyloxycarbonyL That is, R l and R 4 arc each a 

In a preferred embodiment a photoremovable protecting 63 hydrogen atom. R 3 and R 3 are each a methoxy group, and 
group is attached to an activated nudeotide on the R 5 is a hydrogen atom. Another p a cf c ued activated amino 
$ -hydroxyl group: add is formed when the photoremovable group is 



NO, 
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6-nhrooipcrooyi: R 1 and R 4 are each a hydrogen atom. R 2 of skill in the art such as a model no. 2025 made by Spectra 
and R together form a methylene acetai and R 3 is a Physics. Light from the source is directed at a less 1H4 
hydrogen atom. Other protecting groups are possible. ■ which is preferably a cylindrical lens of the type well known 
Another preferred activated ester is formed when the pho- to those of <KH in the art. The resulting output from the lens 
toremovable group is methyl-6-nitroveratryl or methyl-6- 3 1044 is a linear beam rather than a spot of light resulting in 
mtropiperonyl. the capability to detect data gihtf* n ti f Hy « imni»aw pot^ty 

Another preferred activated amino acid is; formed when . . along a linear array of pixels rather than on a ptxel-by-pixel 
the photoremovable protecting group has the general for- basis. It will be understood that a cylindrical lens Is nsed 
mula: herein as an illustration of one technique for generating a 

1Q linear beam of light on a surface, but that other techniques 
could also be utilized. 

The beam from the cylindrical lens is passed through a 
dichroic mirror or prism (1#96) and directed at the surface 
of the suitably prepared substrate 1M8. Substrate IMS is 
placed on an x-y translation stage 1*09 such as a model no. 
15 PM50G-8 made by Newport. Light at certain locations on the 
substrate will be fluoresced and transmitted along the path 
indicated by dashed lines back through the dichroic xmrror* 
and focused with a suitable leas 1619 such as an £/1.4 
camera leas oo a linear detector 1112 via a variable f stop 
20 fanning lens 1*14. Through use of a linear light beam, it 
becomes possible to generate data over a line of pixels (such 
where R , R , and R independently are a hydrogen atom, a u abom i qq) the substrate, rather than from indi- 
lower attyl aryL benzyl, halogen, nydroxyl. altoxyL thfal, ^ 0D ^ ^b^. u, alternative embodiment 

thxoether. ammo, miro. carboxyl, fonnai^ fannamido Ugfatu * , 2 ^imensional area of the substrate and 

sultanates, sulfide or phosphido group, and R and R M flucresccd ^ ietected ^ a z^Kmensioaal CCD array, 
mdependenuy are a hydrogen atom, an atoxy. linear detection i. preferred because substautUlly hi*rJr 

aryunyarogen. or aucenyi group ; ineresumngccmpounais povcrdcasiticl 

are obtained. 

f ur !^ e -P rDtectc<1 . anuao * ad ^? Bg » Py^y^V Detect M12 detects the amount of light fluoresced from 
loxv^nyl protecting groop attached to toe anune^ A mare ^ „ , of p^oTWding to one 

''I™? Z^T " 30 embodiment the detector is « lineTcCD anay of the type 

each ahydrogen atom commonly known to those of akffl in the an. The x-y 

The urethane-protected amino aods having a photore- ^ ft m M ^ ^ „ n ^ 

movable croteotme group of the present invention are ore- ," ~s>- "»" L ~"»>-*~ ^Z^T^v ^JT 

°T_ f _ *V T . ^ all opcnbly connected to a computer 1016 such as an IBM 

pared bycondensanonof anN-pro^ammojod wohan ^ a 7 ^ 0 ^T^ f ^ „ d ^ 

acylaung agent suchas an acyl habde, .nhydnde chlorc- 3J ^ mtheCCDmy _ 

formate and the inn- /See Fuller et aL. U.S. Pat. No. , . / .... 

4.946^42 and Fuller «\L,TZJ. SJi L^1990) JtEEe? ^ ht ™<^» 

112:7414-7416. both herein incorporated by reference for ^^i^^^^ ^ " *** " l ' ™"? 

• 1 ' and lmmnty data are gathered with the computer via the 

ail purposes). detector 

Urett^e-protected amino acids having r*<*«emcvable „ ^ u ^^^^ ^ ard| ^ c rf ^ ^ coUc ^ n 
protecting groups are generally useful as reagents during . ^ Opcratioa of the system occurs 

solid-phase peptide synthesis, and because of the spatially „J7ZrJZ ZZ^^Tiy^^ >T y 

. Wl .. t 4W . . . 4 ~ v under the direction of the photon counting L a ogam 11 12 

selectivity possible with the photoremovable protecting included herewim 

firouD. are esoecialfv useful for the soatiallv addressable . « r^^^***** ». wa 

urethane^oup first serves to activate me caxboxy termitius 43 ^^1°^ aT^P^J^l! 

the peptide bond is fonned, u^tc^movable protec^ng as Sard Research SR 4^md^y^ 

group protects *e newry formed ammo terminus from controUff 11#8sucfaut PM50a The s^fro^ theS 

further reaction. These amino acids are also highly reactive „ fmm " flllo ^J^ mMl l^^ 

m nnri^wu «t/* ,< ^^..^^ •^™TTfK^ ~r~f*~ 50 irom me fluorescing substrate enters a photon counter lilt, 

to nucleopmles, such as deprotected amines on the surface TWwi/4 - ^.^^ ^ .^.w 11*^ tw, ^- 

of the soUd support, anddne to mis high reactivity, the I^^oujut to Ae Data arc output from the 

^\;r^^ rJ^^^7!LZ ^Jr^^^r^^ scaler indicative of the number of couats u a given region, 

atdt^a^r^^r ^<^try reduced, ^ scannillg a ^ mc ^ controller is acti- 

and yields are typically higher. yated with cormnands for accderation and velocity, which in 

IV. Data Collectiofi 55 turn drives the scan stage 1112 such as a FM500-A to 

A. Data Collection System another region. 

Substrates prepared in accordance with the above descrip- Data are collected in an image data file 1114 and pro- 

tion are used in one embodiment to determine which of the cessed in a scaling program 1116, also indwfcd in Appendix 

plurality of sequences thereon bind to a receptor of interest B. A scaled image is output for display on. for example, a 

FIG. 11 illustrates one embodiment of a device used to 60 VGA display 1118. The image is scaled based on an input of 

detect regions of a substrate which contain flourescent the percentage of pixels to dip and the mintrmtm and 

markers. This device would be used, for example, to detect maximum pixel levels to be viewed. The system outputs for 

the presence or absence of a labeled receptor such as an use the min and max pad levels in the raw data, 

antibody which has bound to a synthesized polymer on a B. Data Analysis 

substrate. 65 The output from the data collection system is an array of 

Light is directed at the substrate from a light source 1002 data indicative of fluorescent intensity versus location on the 

such as a laser light source of the type well known to those substrate. The data are typically taken over regions substan- 
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dally smaller this the area in which synthesis of a given 
polymer has taken place. Merely by way of example, if 
polymers were synthesized in squares on the substrate 
having dimensions of 500 microns by 500 micro as. the data 
may be taken over regions having dimensions of 5 microns 
by 5 microns. In most preferred embodiments, the regions 
over which flourescence data are taken across the substrate 
are less than about W the area of the- regions in which 
individual polymers are synthesized, preferably less than Vto 
the area in which a single polymer is synthesized, and most 
preferably less than Vioo the area in which a single polymer 
is synthesized. Hence, within any area in which a given 
polymer has been synthesized, a large number of fluores- 
cence data points are collected. 

A plot of number of pixels versus intensity for a scan of 
a cell when it has been exposed to, far example, a labeled 
antibody will typically take the form of a bell curve, but 
spurious data are observed, particularly at higher intensities. 
Since it is desirable to use an average of fluorescent intensity 
over a given synthesis region fa d^rrnirring relative binding 
affinity, these spurious data will tend to undesirably skew the 
data. 

Accordingly, in one embodiment of the invention the data 
are corrected for removal of these spurious data points, and 
an average of the data points is thereafter "tiii?^ m deter- 
mining relative bindin g efficiency. 

FKj. 13 illustrates one embodiment of a system for 
removal of spurious data from a set of fluorescence data such 
as data used in affinity screening studies. A user or the 
system inputs data relating to the chip location and cell 
corners at step 1302. from this information and the image 
file, the system creates a computer representation of a 
histogram at step 13+4. the histogram (at least in the form of 
a ™r*Tp" t * r Ale) plotting number of data pixels versus 
intensity. 

For each celL a main data analysis loop is then performed 
Far each cell* al step 13+6, the system ralnitatrs the total 
intensity or number of pixels for the bandwidth centered 
around varying intensity levels. For example, as shown in 
the plot to the right of step 13+6, the system ralmlaftts the 
number of pixels within the band of width w. The system 
then "moves" this bandwidth to a higher center intensity, and 
again ml ml am the number of pixels in the bandwidth. This 
process is repeated until the entire raoge of intensities has 
been scanned, and at step 13+8 the system determines which 
band has the highest total number of pixels. The data within 
this bandwidth are used for further analysis. Assuming the 
bandwidth is selected to be reasonably small, this procedure 
will nave the effect of eliminating spurious data located at 
the higher intensity levels. The system then repeats at step 
1310 if all cells have been evaluated, or repeats for the next 
cell. 
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At step 1312 the system then integrates the data within the 
bandwidth for each of the selected cells, sorts the data at step 
1314 using the synthesis procedure file, and displays the data 
to a user on. far example* a video display or a printer. 

5 

V. Representative Applications 
A Oligonucleotide Synthesis 
i The generality ,, of light directed spatially * addressable 
: parallel chemical synthesis is demonstrated by application to 
xq nucleic acid synthesis. 
1. Example 

Light activated formation of a mymidmecytidine dimer 
was carried out A three dimensional representation of a 
fluorescence scan showing a checkerboard pattern generated 
l3 by the tight-directed synthesis of a dinudeotide is shown in 
FIG. & S'Hsirmveratryl thymidine was attached to a synthe- 
sis substrate through the 3' hydraxyl group. The mtroveratryl 
. protecting groups were removed by illumination through a 
500 mm checkerboard mask. The substrate was then treated 
with pfaosphoramidite activated 2'-deoxycytidine. In order to 
20 follow the reaction fluorometrically, the deoxycytidine had 
been modified with an FMOC protected arninohexyl KhImt 
attached to the exocycHc amine (5 , -0-dimcthoxytrityl-4-N- 
(6-N-fluorenylmethylcarbamoyl-hexylcarboxy)-2*- 
deoxycytidine). After removal of the FMOC raxxecting 
23 group with base, the regions which contained the dinucte- 
otide were fluoresce ntly labelled by treatment of the sub- 
strate with 1 mM FITC in DMF for one hour. . . 
The three-dimensional representation of the fluorescent 
. intensity data in FIG. 14 dearly reproduces the checker- 
30 board ulumination pattern used during photolysis of the 
. substrate. This result demonstrates that oligonucleotides as 
well as peptides can be synthesized by the tight-directed 
method. 

25 . VI Conclusion 

The inventions herein provide a new. approach for the 
simultaneous synthesis of a large nnm> i^ of mryipT^nKf^ 
The method can be applied whenever one has chemical 
building blocks that can be coupled in a solid-phase format, 
40 and when light can be used to generate a reactive group. 

. The above description is illustrative and not restrictive. 
Many variations of the invention will become apparent to 
those of skill in the art upon review of this .disclosure. 
. Merdy by way of en amplr, while the invention is illustrated 
primarily with regard to peptide and nudeotide synthesis, 
the invention is not so limited. Hie scope of the invention 
should, therefore, be determined not with reference to the 
above description, but instead should be determined with 
reference to the appended claims along with their full scope 
of equivalents. 
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( I ) OB4BLAL 1NPGKJ4AXJON: 

(Hi ywma. ofsbocencbs: n 



( 2 yifVKMHjaKm rem. sBQiDmiu 

< i )3BQGENCB CHAAACX9UXXIGS: 
( A )L£NOTH:S^Boa«fe 

( C ) STAAM3BXESS; «*k 
( D ) TOPOLOGY: bar 
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-continued 



( i i ) MOLECULE TTTE: p^xide 

(It) SEQUENCE XSrarpnOH: SEQ ID NO:l: 

T y t Oly Ol y P b • L • u 

1 5 



( 2 ) INFORMATION FOR SEQ ID N02: 

( i ) SEQUENCE CBARACTEKXST7CS: 
( A ) LENOIR* 3 anaao ndi 
( B )TYPE: ■oaao and 
( C ) STXANDEBNESS: tdafk 
( D )T0KXjOGT Bw 

( i i JMCLBCULBTYP&p^ttk 

(i i ) SEQUENCE DESCRIPTION: SEQ ID N03: 

Pro Oly Oly Pbi L«n 

1 3 

( 2 ) INFORMATION POR SBQ ID NOO: 

(. j ) SEQUENCE CHARACTERISTICS: 
( A ) LBOIB: 5 aatao aod* 
( B )TTFC; anno add 
( C ) SISANDEDNCSS: cm* 
( D )TOPOCOOTi Bm 

( i i ) MCXJSTOLE TT7E: paptid* 

( x i ) SEQUENCE DESCRIPTION: SBQ ID NOJ: 

T 7 r O 1 7 Ala Ol 7 P ■ • 

( 2 )INPORMATIONfQRSBQIDNO*: 

0 ( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 6 mod anda 
( B )TTFE:anwac*l 
( C ) STRANDEDNESS: ate«fc 
( D )TOOLOGT:Saaar 

( i i ) MOLECULE TYPE; paptide 

( x i )SBQU£><* DESCRIPTION: SBQ ID NO* 

Tyt O J t At* P ■ • La* 3 a r 

1 5 



( 2 ) INPORMAIJON POR SBQ ID NOc* 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LZNOTB: 5 mBmno ndi 
( B ) TTPEi mama mad 
{ C ) STRANDEDNESS: tm^m 
< D ) TOPOLOOYt Baaar 

( i i ) MOLECULE TnE; papaaa 

( x 1 ) SBQJUfiNCE DESCRIPTION: SBQ ID WW: 

T 7 i Oly Ala P h « S*r 

t 5 



( 2 ) INFORMATION POR SEQ O NOA 

( i ) SEQUENCE CEURACZERI3T)C3: 
( A ) LENQTB: 3 aaaao mmd* 
( B )TTPB:aaaaK>acad 
( C >3TRANDEDNBS&aia«»» 
( D ) TOFOLOQTfc iaaaar 
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-continued 



< t i ) MfXJBaO.fi TYPE: pap&fc 

( x i ) S8QUENCE DGSOUFT90H: S8Q © HW; 

T 7 r Oly Ala Phe Left 
1 3 



( 2 ) tXPCBMAJKH PGR SEQ ID NOtf : 

( t ) S8QUENCE CSAJLACTEUSTtCS: 
< A ) LENOTB: 6 maao aadj 
( B )TYPE:*aiao*cii 
( C ) STftAND£DNES&: om»i 
(D)TCPCU»Y:iaaar 

( 1 i )a«LBCOl£ TYPE: papa* 

( x t ) SBQCENCE D fcMJUPLKJ K: SEQ ID HO-.7: 

Tyt Oly Oly Pii Lao 5 a i 

1 3 



( 2 JDmtMATJCNPOKSBQIDNO* 

( 1 )SaqiJENC£<SAaACXEU5nC5: 
( A ) LHNOTB: 4 nv atab 
( 1 ) TYPE: aaiao add 
( C ySOtAXXDKBSS: wmjm 
( D)TQPOLOOY:aaaar 

( i i ) MOCBCULB TTPE; p^ridi 

< x 1 ) aOJOEHCE DesaUPDON: SBQ ID NO* 

Tyt Oly Ala Pha 

1 



( 2 ) INPOftMAIlON PGR S8Q ID NOdfc 

( t ) 3BQUEXZ CHA* ^CTOtOTCS: 
( A ) LENOTB: 3 aaaao sod* 
( B )TYPB: aaaao aal 
( C ) 3T1ANDEDNE33: aaata 
( D )T0POLOOT: iaaar 

( i i ) KXJKULE TYPE: papada 

( x i ) 580JUENCE DESOtirCON: 3SQ ID NO* 

Tyt Oly Ala Lav Sir 

1 3 



( 2 )MOtMXnONPCftSBQIDMO£lO: 

( i )5AQUENCTCHAJLAjCraX3TJC3: 
( A )LWOIB:)aw»aaii 
( 1 ) TYPE: aaiaa aria 1 
<C )5OtAM0B3raS:aa*a 
( D ) TOPOtXkIR iaaar 

{ I I )aCUOUTTPB:aapna» 

( x i ) «XP »a LJEJL*lPlXJ NjgqPNO;»-. 

Ty r Oly Oly Paa Sa r 

1 3 



< 2 )IHPOKa4AI10NPO*SBQIDNO:U: 

( ■ ) SBQUSCE CSAftACiaU5TXC3: 
< A ) UK7TB: * aaw «aai 
( 1 )TYPB: aaai nil 
( C ) SOUraeDHBSS: aa* 
( D ) TOPOLOGY: iaaar 
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-continued 



( i i ) MOLECULE TTF&p^ok 

( x i )S8QCEHCEDESaUPBON:S8QIDNO:U: 

Tyr OI7 Alt L • a 

t 



( : ) INFORMATION FOR SBQ CD NO:12: 

( i ) SEQUENCE CHARACTERISTICS: 
(A ) UEMJIH; 6 abo mad* 
( B )TYPE: no Kid 
( C ) STRAND6DNBSS: a>«> 
( O ) TOPCLOOT. £■« 



< i t ) MOLBCULE TYPE: p 
( x i ) SEQUENCE DESOUPTKtti: SBQ CD NCrU: 

Tyr 01 7 Ala Phi L*n Pia 

1 3 



( 2 > INFORMATION FOR SBQ ID NO: 13: 

( i ) SBCjCENQJ CHARACTERISTICS; 

( A ) LENGTH: 5 mo mad* 
( B )TTP&aaaoaad 
( C ) 3TRANDEDNE5S: •■«> 
( D )TOPCC0OT:kwr 

( i i ) MOLECTJLB TTPfi: p^rick 

( x i ) SBQGENCE DESCRIPTION: SBQ ED HOtD: 

T y t O 1 y Alt Pb« Ph • 

1 5 



( 3 ) INFORMATION FOR SBQ ID NO-.14: 

( i ) 3BQCENCE CBARACTOISTCS: 
( A ) LENGTH: 3 mtao acids 
( B )TTT£: aviso aad 
( C ) ST3LANDBDNE5S: mjm 
( D ) TDPOLOOT: &oaar 

( i i ) MOLECULE TTPE: pjpbds 

( x i ) SBQUENCE DCCTITTION: SBQ ID NOU4: 

T y t O 1 y O 1 y L«* S • i 

1 5 



{ 2 ) INFORMATION FOR SBQ ID N013 

< i )S8QJ9H 

( A >LENOrfi:3M«oi 
( B ) TYPE: and 
( C ) 3TRANDEDNES3: i 
( D ) TOPOLOOT. imm 



< i i ) MDLBCTf PI 
( a ■ >«B^»CBP BJ C >£ FTXy:OQIDWO:l3: 

Tyr O 1 y O I y Pha Lam 

1 3 



( 2 ) INFORMATION FOR SBQ CD NOI* 

( i >SBQJXNCBCHA»ACrEk3SIICS: 
( A ) LENOTHi 6 aaaao aodt 
(B )TTPE: mm» nil 
( C )SntANDEDNE55:ak«i» 
( D ) TOPOLOGY: iaaat 
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-continued 



( i a ) MCCBCIXE TWE: papada 

(ll) S8QUENCE teSCXmOH; 5EQ ID MO:l6: 

Tyt Oly All F h« S«r Pb« 
1 3 

( 2 )tNTORM>JIONP0RSBQIDNO-J7: 

( i ) 58QUENCE CBAJUtCTBUSTXCS: 
( A ) LENGTH: 7 om addi 
( B )TTP£jMK»oKsi 
( C ) 3TRANDEDNE5S: 
(D )T0*CLOOY:&Htar 

( i i ) MOLECULE TYPE: p^afc 

(x » )98QCCNCEOesCBIFD0K:S8Q2DNO:17: 

Tyt Oly All Pit Lav Sar fi« 

1 3 

( 3 ) INFORMATION FOR 3BQ H> NOlfc 

( 1 )SBQUBCECBAftACni£SnC5: 
( A ) LENGTH: « mmo mm* 
( B )TY?E:a»J»ad 
( C ) SISANDCDNESS: aiagU 
(O)TCrCLOOYiiMr 



( i i )i 

( x i ) saguENCE DesournoK: seq id no-.ii: 

Tyr Oly All Pkf Mat Ola 

1 3 



( 2 ) INFORMATION FOR S8Q ID H&.19: 

( i ) SaqUEHCE CBAKACTEH5I1CS: 
( A ) LENOTB: 5 ammo and* 
( B )TYFE:aaaapacid 
( C ) 5TRANDHDHBS1: a*a> 
( D ) TOPOLOGY: Imt 

( i i )MOLBCULETY]feBap*>> 

( i i )SBq^BNCEDB9CB]rXKH:S8QIDN0:19: 

Tyt Oly Ala fit Mat 

I 3 



( 2 ) INFORMATION FOR W ID M>9Eh 
( i )3 



( A)LENOTH:3ai 
(l)TTPE:«iMi 
( C ) STBANCBDMI 
( D )TOPOLOOT:I 



< i I ) MOLECULE TYPE: p 
( i i )3aQOOCED B3LKlfmJ HiSaQIDHOaO: 

Tyr Oly Ala Pba Ola 

1 3 



( 2 ) INFORMATION FOR SBQ ID N031: 

< i ) SEQUENCE CHARACTERBTJCS: 
( A ) LENGTH: 5 mmma aadi 
( B )TTPBi i n i l 
( C ) 3TRANDEDNE33; mm) 
( D ) TOPOLOGY: Saaar 
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ARRAYS OF NUCLEIC ACID PROBES ON previously characterized sequence or reference sequence. 

BIOLOGICAL CHIPS The methods of the invention can be used to detect varia- 
tions between a target and reference sequence, including 

CROSS-REFERENCE TO RELATED single or multiple base substitutions, and deletions and 

APPLICATION 5 insertions, of bases, as well as. detecting the ..presence, 

-n,. • « . \. c »* : location, and sequence of other more complex variations 

' fii e ?n^^i^ :betwecoa ^ and < cfcrcace s^*™ "«c acid: 

J >f»f nV : ? 3 " -° n c' W M Ch noo^ lI ft n , Ua ! l Sc ^ c P rcscnt iavCDtioa provideTarrays of oligonucleotide 
m pa t ^of U.S. patent application Ser. No 082,937 filed 25 probcs immobilized 00 P a sol{d s £ rt . afC 

Jun. 1993, cow abandoned, incorporated herein by refer- ^ preferably synthesized directly on the support using 

CDCe ' VLSIPS™ technology, but other synthesis methods and 

Research leading to the invention was funded in part by immobilization of pre-synthesized oligonucleotide probes 

NIH grant No. 1R01HG00813-01 and DOE grant No. can be used to make the oligonucleotide probe arrays, called 

DE-FG03-92-ER81275, and the government may have cer- "DNA chips", of the invention. In general, these arrays 

tain rights to the invention. comprise a set of oligonucleotide probes such that, for each 

15 base in a specific reference sequence, the set includes a 

BACKGROUND OF THE INVENTION probe (called the "wild-type" or "WT probe) that is exactly 

i t:- w f >u j »* complementary to a section of the reference sequence 

1. field ot the Invention ^ including the base of interest, and four additional probes 
The present invention provides arrays of oligonucleotide (called "substitution probes"), which are identical to the WT 

probes immobilized in microfabricated patterns on silica 20 probe except that the base of interest has been replaced by 

chips for analyzing molecular interactions of biological one of a predetermined set (typically 4) of nucleotides. In the 

interest. The invention therefore relates to diverse fields preferred embodiment, one of the four substitution probes lV 

impacted by the nature of molecular interaction, including identical to the wild type probe; the other three arc comple- 

chemistry, biology, medicine, and medical diagnostics. mentary to targets that have a single -base substitution at this 

2. Description of Related Art 25 position. 

ni - . . . . . , , In another aspect, the invention relates to the arrangement 

Oligonucleotide probes have long been used to delect of ? obcs m thc , Q 0QC embodiment, the 

complementary nucleic acid sequences in a nucleic acid of probcs m arrangcd on thc chi so that bcs for a ' ivcQ 

interest (the target nucleic acid). In some assay formats, posftioQ io ^e sequence. are adjacent, and probes for adia- 

the oligonucleotide probe is tethered, i.e., by covalent 3Q cent positions in the reference sequence are also adjacent. to 

attachment, to a solid support, and arrays of oligonucleotide one another on the chip. One me thod arranges the probes for 

probes immobilized on solid supports, have been used to a single base in a short column (alternately row) arid 

detect specific nucleic add sequences in a target nucleic arranges the columns in the order of the base position to ' 

acid. See, e.g., PCT patent publication Nos. WO 89/10977 form horizontal (alternately vertical) stripes. The wild-type 

. and 89/1 154S. Others have proposed thc use of large num- c and each of the substitution probes have specified positions 

bers of oligonucleotide probes to provide the complete within the column so that all the probes corresponding to an 

nucleic acid sequence of a target nucleic but failed to A substitution, for example, are in a single row. The stripes 

provide an enabling method for using arrays of immobilized mav oc separated on the chip by a blade row or column, 

probes for this purpose. See U.S. Pat. Nos. 5,202,231 and The DNA chips of the invention can be made in a wide 

5,002,867 and PCT patent publication No. WO 93/17126. number of variations. For some applications, leaving out the 
Thc development of VLSIPS™ technology has provided *° T^'^ ™ w > paving out unimportant bases, pooling bases, 

methods for making very large arrays of oligonucleotide "d deletion probes, varymg the length 

probes in very small arrays. See U.S. Pat. No. 57143,854 and ° f ^ *T*? 3 *? l ° m * kc * C P robcS h * VC thc s * mc 

PCT patent publication Nos. WO 90A5070 and 92/10092, ° f r a ? dar Tm f^ 1 ™ t0 ^ °. r . t0 avo ! d 
each of which is incorporated herein by reference. U.S 45 ™f' v "?"* fa ™ taton portion using multiple 

patent application Ser. No. 082,937, filed Jun. 25, 1993, * pf ° bcS f ? * S1 f T^Tl P 1 ?^ "P h « te P robcs or 
S^«*K-f r i- r i- i .-j arrays, placing blank "streets" (no probe) be ween rows, 

S^^^^^T JBOf , O ? g0,,BdeOl,d r coluLas^rtadividualprobes.aadusbgcoatrolptobcsmay 
probes that can be used to provide the complete sequence of be aDoroDriate * r & r j 

a target nucleic acid and to detect the presence of a nucleic PP P /. . ^ vrA r J 

acid containing a specific nucleotide sequence. / J0 . ^ P««o; invention also provides DNA chips for detect- 

ew . a ci ■ ?■'■>,. , me mutations associated with cystic fibrosis, including 

nr^ ^ mutations bexon^ 4, 7, 9. 10. 11, 20, and 21 of the CFTO 

obde probes, called ;DNA chips" offer great promise for a ^ iavcation aIso ides DNAchi for delcclil] 

wide vanetyofapphcatioti^ * utations fn (he 53 * a in whfc g mulaU - ons ar f 

required to realize this promise, and the present invention [0 bc associated ^ a ^ variety of caQCCrs . other 

neips meet mat need. 55 D na chips of me invention provide probe arrays for detect- 

SUMMARY OF THE INVENTION *°g specific sequences of mitochondrial DNA, useful for 

. identification and forensic purposes. The invention also 

The present invention provides methods for making high- provides DNA chips for detecting specific sequences of 

density arrays of oligonucleotide probes on silica chips and nucleotides or mutations associated with the acquisition of a 

for using those probe arrays to detect specific nucleic add 60 dmg rcsistant pfaenotype in an infectious organism, such as 

sequences contained in a target nucleic acid in a sample. The rif am picin or other drug resistant TB strains and HIV, in 

invention also provides arrays of oligonucleotide probes on wh ; ch mutat i ons ' m M R tf A polymerase gene are known to 

DNA chips, in which thc probes have specific sequences and g ; vc ^ lo dfUg resistance, 
locations in the array to facilitate identification of a specific 

target nucleic acid. In another aspect, the invention provides 65 BRIEF DESCRIPTION OF THE DRAWINGS 

methods for detecting whether one or more specific FIG. 1 shows how the tiling method of the invention 

sequences of a target nucleic acid in a sample varies from a defines a set of DNA probes relative to a target nucleic acid. 
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la the figure, the target is a DNA molecule, the probes are from the genomic DNA of an individual with wild-type 
single-stranded nucleic acids 16 nucleotides in length, and AF508 sequences; in panel B, the target nucleic acid origi- 
only a portion of the probes defined by the method is shown. nated from a heterozygous (with respect to the AF50S 

FIG. 2 shows an illustrative tiled array of the invention mutation) individual, 
with probes for the detection of point mutations. The base at 5 . FIG. 8, in sheets 1 and 2, corresponding to panels A and 
; the position of substitution in each of the wild-type probes B of FIG. 7, shows graphs of fluorescence intensity versus 
is shown in the wild-type lane, and the shad irig show's the . tiling position. The labels on the horizontal axis show the 
location of the substitution probe having the wild-type bases in the wild-type sequence corresponding to the posi- 
sequence. The SEQ ID. NOS. corresponding to the two tioo of substitution in the respective probes. Plotted are the 
pep tide sequences shown in the top portion of FIG. 2 are 311 10 intensities observed from the features (or synthesis sites) 
and 312, respectively. The SEQ ID. NOS. corresponding to containing wild-type probes, the features containing the 
the five peptide sequences listed at the bottom of FIG. 2 are substitution probes that bound the most target ("called"), and 
313, 314, 315, 313, and 316, respectively. the feature containing the substitution probes that bound the 

FIG. 3, in panels A, B, and C, shows an image made from tar S el thc second highest intensity of all the substitution 
the region of a DNA chip containing CFTR exon 10 probes; 15 probes ("2nd Highest"). The SEQ ID NOS. corresponding to 
in panel A, the chip was hybridized to a wild-type target; in me lw0 P e Pti de sequences shown in sheet 2 of FIG. 8 are 332 
panel C, the chip was hybridized to a mutant AF508 target; and 318 » respectively. 

and in panel B, the chip was hybridized to a mixture of the FIG. 9 shows the human mitochondrial genome; "O^" is 
wild-type and mutant targets. The SEQ ED. NOS. corre- the H strand origin of replication, and arrows indicate the 
sponding to thc four peptide sequences shown in FIG. 3 are 20 cloned unshaded sequence. 

317-320, respectively. FIG. 10 shows the image observed from application of a 

FIG. 4, in sheets 1-3, corresponding to panels A, B, and sample "of mitochondrial DNA derived nucleic acid (from 
C of FIG. 3, shows graphs of fluorescence intensity versus the mt4 sample) on a DNA chip. 

tiling position. The labels on the horizontal axis show the FIG. 11 is similar to FIG. 10 but shows the image 
bases in the wild-type sequence corresponding to the posi- observed from the mt5 sample. 

tion of substitution in the respective probes. Plotted are the F IG. 12 shows the predicted difference image between the 
intensities observed from the features (or synthesis, sites) m t4 and mt5 samples on the DNA chip based on mismatches 
containing wild-type probes, thc features containing the between the two samples and the reference sequence, 
substitution probes that bound the most target ("called'') and 3Q HQ 13 shows ^ ^ diffefCQCe fa obscrved f 
the feature containing the substitution probes that bound the . the mt4 and mt5 samples. ,\ 
Urget with the second highest intensity of all the substiruuon ™ ... V -t_ , ""'r 

probes ("2nd Highest"). Tne SEQ ID. NOS. corresponding . , FIC !;. 14 ' m shects 2 J if °« - « pI °' ° f f™*?* 
to the two peptide sequences shown in sheet 1 of FIG. 4 are f CDS i U # ? ^ 11 of the array and a tabula- 

321 and 318, respectively; the SEQ ID. NOS. corresponding „ U °°° f J* ™"tuws delected. 

to the two peptide sequences shown in sheet 2 of FIG. 4 are FIG - 15 shows thc discrimination between wild-type and 

322 and 318, respectively; and the SEQ ID. NOS. corre- mulaDt b y brids obtained with the chip. A median of the six 
sponding to the two peptide sequences shown in sheet 3 of normalized hybridization scores for each probe was taken; 
FIG. 4 are 323 and 318, respectively lnc P Iols ^ ratio of tDC median score to thc normal- 

FIG. 5, in panels A, B, and C, shows an image made from „ to . d W^^™ «» rc versus mean counts. A ratio of 1.6 
a region of a DNA chip containing CFTR exon 10 probes; and mcan C0UQts abovc 50 ^ M D0 fabc ^ 0Sltlvcs ' 
in panel A, the chip was hybridized to the wt480 target; in FIG * 16 llIustrates how identity of the base mismatch 
panel C, the chip was hybridized to the mu480 target; and in ma > r i flflucncc a bflity to discriminate mutant and wild- 
panel B, the chip was hybridized to a mixture of the tv P c sequences more than the position of the mismatch 
wild-type and mutant targets. The SEQ ID. NOS. corre- 4J withia an oligonucleotide probe. The mismatch position is 
spending to the peptide sequences shown in FIG. 5 are expressed as % of probe length from the 3'-end. The base 
324-327, respectively. change is indicated on the graph. 

FIG. 6, in shects 1-3, corresponding to panels A, B, and nG - 17 P rovidc s * 5* to 3' sequence listing of one target 
C of FIG. 5, shows graphs of fluorescence intensity versus corresponding to the probes on the chip. X is a control probe, 
tiling position. The labels on the horizontal axis show the 50 Positloas mat dlffcr m ih * tat S ct ( i c > arc mismatched with 
bases in the wild-type sequence corresponding to the posi- ^e probe at the designated site) are in bold. The SEQ ID. 
tion of substitution in the respective probes. Plotted are the N0 ; corresponding to the peptide sequence shown in FIG. 
intensities observed from the features (or synthesis sites) 17 15 

containing wild-type probes, the features containing the FIG. 18 shows the fluorescence image produced by scan- 
substitution probes that bound the most target ("called"), and 55 nin S thc chi P described in FIG. 17 when hybridized to a 
the feature containing the substitution probes that bound the sample. 

Urget with me second highest intensity of all the substitution FIG. 19 illustrates the detection of 4 transitions in the 
probes ("2nd Highest"). The SEQ ID. NOS. corresponding target sequence relative to the wild-type probes on the chip 
to the two peptide sequences shown in sheet 1 of FIG. 6 arc in FIG. 18. 

328 and 329, respectively; the SEQ ID. NOS. corresponding 60 FIG. 20 shows the alignment of some of the probes on a 
to the two peptide sequences shown in sheet 2 of FIG. 6 are p 53 DNA chip with a 12-mcr model target nucleic acid. The 
330 and 329, respectively; and the SEQ ID. NOS. corre- SEQ ID. NOS. corresponding to the fourteen peptide 
sponding to the two peptide sequences shown in sheet 3 of sequences shown in FIG. 20 are 334-347, respectively. 
FIG. 6 are 331 and 329, respectively. FIG. 21 shows a set of 10-mcr probes for a p53 exon 6 

FIG. 7, in panels A and B, shows an image made from a 65 DNA chip. The SEQ ID. NOS. corresponding to the thirteen 
region of a DNA chip containing CFTR exon 10 probes; in peptide sequences shown in FIG. 21 are 334 and 348-359, 
panel A, the chip was hybridized to nucleic acid derived . respectively. 
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FIG. 22 shows that very distinct patterns are observed in the nucleotide sequence of a target nucleic acid with 
after hybridization of p53 DNA chips with targets having oligonucleotide probes of defined length. The length (L) of 
different 1 base substitutions; In the. first image in FIG. 22, the probe is typically expressed as the number of nucleotides 
the 12-mer probes that form perfect matches with the or bases in a single-stranded nucleic acid probe. For pur- 
wild-type target are in the first row (top). The 12-mer probes 5 posesof the present invention, lengths. ranging from 12 to 18 
with single base mismatches arc located in the second, third, ■ : bases are preferred, although shorter and longer lengths can • 
and fourth rows and have much lower signals. also employed. To employ the tiling method, one syn- ' 

FIG. 23, in graphs 2, 3, and 4, graphically depicts the data thcsizcs a * 1 of P robcs dcfincd bv thc particular nucleotide 
in FIG. 22. On each graph, the X ordinate is the position of sequence of interest in the target nucleic acid. For each base 
the probe in its row on the chip, and the Y ordinate is the io m the target DNA segment, one synthesizes a probe comple- 
signai at that probe site after hybridization. mefltarv t0 the subsc <l uencc of the target nucleic acid begin- 

FIG. 24 shows the results of hybridizing mixed target Eg J that baSe and cndmg L " 1 bases 10 lhc 3 '" si " de < see 
populauons of WT and mutant p53 genes to the p53 DNA . ' «. , ,. <• . 

c ^p In a preferred embodiment of the invention, the probes are 

FIG. 25, in graphs 1-4, shows (see FIG. 23 as well) the 1S ^ by ^^bilizaUon typically by covalent 

hybridization efficiency of a 10-«ner piobe array as com- altachmen of a pre-synthesized probe or by synthesis of the 
pared to a 12-mer probe array probe oa the siibstrate) on tte substrate or chips in lanes 

„_ . . / ....... . stretching across the chip and separated, and these lanes are 

HG 26shows an .mage of a P 53 DNA chip hybridized to m turned 6 arranged in bl * cks of * Kkribly 5 ^ although 

a 25f * l - 20 blocks of other sizes will have useful application, as will be 

FIG. 27 illustrates how the actual sequence was read from apparCQt fr om thc following illustration. The first of these 
thc chip shown in FIG. 26. Gaps in the sequence of letters . fivc lancs> caUcd thc « wiM . V p e i anc » f contains probe* 
in the WT rows correspond to control probes or sites. arraagc d in order of sequence, and all of the probes are 
Positions at which bases are miscalled are represented by complementary to a specified wild-type nucleic acid 
letters in italic type in cells corresponding to probes in which 2J 7^ othcr four lancs coalain probc ^ for dctcct . 

the\VTbaseshavebeensubstitutedbyolherbases.TheSEQ j aH possibIc s i ag i c .base mutations in the defined 
ID. NO. corresponding to the peptide sequence shown m xqu<incc; in tU rn, these probe sets are defined by a position 
FIG. 27 is 360. of potential non-complementarity in the probe relative to the 

FIG. 28 illustrates the VLSIPS™ technology as applied to , ar g Ct a smg ] e base mismatch) and the identity of the 
the light directed synthesis of oligonucleotides. Light (hv) is 30 nucleotide in the probe at that position (i.e., whether the 
shone through a mask (M,) to activate functional groups nucleotide is an A, C, G, or T nucleotide). The position of 
(—OH) on a surface by removal of a protecting group (X). mismatch, also called thc position of substitution, is prefer- . 
Nucleoside building blocks protected with photoremovable ably selected to be near the center of the probes, i.e., position 
protecting groups (T-X, G-X) are coupled to the activated 7 of a probe of L-15. 

areas. By repeating the irradiation and coupling steps, very 3S For each probe in me wiid-type lane, one synthesizes four 
complex arrays of oligonucleotides can be prepared. probes (one fof cach of the lancs other than the wild-type 

FIG. 29 illustrates how the VLSIPS™ process can be used j aQC ) ( jb rcc 0 f mcsc f our p ro bes is identical to the corre- 
to prepare "nucleoside combinatorials" or oligonucleotides spending wild-type probe but for thc base at thc position of 
synthesized by coupling all four nucleosides to form dimers, substitution, and the remaining probe is identical to the 
tnmers, etc. 40 wild-type probe. This set of four substitution probes is 

FIG. 30 shows the depro lectio n, coupling, and oxidation preferably placed in a column directly below (or above) thc 
steps of a solid phase DNA synthesis method. corresponding wild-type probc, thus creating an A-lane, a 

FIG. 31 shows an illustrative synthesis route for the C-lane, a G-Iane, and a T-lane. FIG. 2 shows an illustrative 
nucleoside building blocks used in the VLSIPS™ method. tiled array of the invention with probes for the detection of 

FIG. 32 shows a preferred photoremovable protecting 45 point mutations. The base at the position of substitution in 
group, MeNPOC, and how to prepare the group in active each of the wild-type probes is shown in the wild-type lane, 
form. and the shading shows the location of the substitution probe 

FIG. 33 illustrates an illustrative detection system for having the wild-type sequence. Below are the probes that 
scanning a DNA chip. would be placed in the column marked by the arrow if the 

50 probe length were 15 and the position of substitution were 
DETAILED DESCRIPTION OF THE \ ■ ... . 

INVENTION 3'-CCGACTGCAGTCGTT (SEQ. ID. NO:l) 
Using the VLSIPS™ method, one can synthesize arrays 3-CCGACTACAGTCGTT (SEQ. ID. NO:2) 
of many thousands of oligonucleotide probes on a substrate, 3-CCGACTCCAGTCGTT (SEQ. ID. NO:3) 
such as a glass slide or chip. The method can be used, for 55 3-CCGACTGCAGTCGTT (SEQ. ID. NO:l) 
instance, to synthesize "combinatorial" arrays consisting of, 3'-CCGACTTCAGTCGTT (SEQ. ID. NO:4) 
for example, all possible octanudeot ides. Such arrays can be Thus, the substitution lanes occupy four of the five lanes 
used for primary sequencing-by-hybridization on genomic separating successive wild-type lanes on the chip; the blocks 
DNA fragments or other nucleic acids or to detect mutations of five lanes can be separated by a sixth lane for measure- 
in a target nucleic acid for which the normal or 'Vild-type** 60 meat of background signals. 

nucleotide sequence is already known. Using the preferred The DNA chips of the invention have a wide variety of 

method of the invention, one employs a strategy called applications. In one embodiment, the DNA chip is used to 

"tiling** to synthesize specific sets of probes or at spatially- select an optimal probe from an array of probes. In this 

defined locations 00 a substrate, creating the novel probe embodiment, an array of probes of variable length and 

arrays and "DNA chips'* of the invention. 65 sequences is synthesized and then hybridized to a target 

To illustrate the tiling method of the invention, consider nucleic acid of known sequence. Thc pattern of hybrid iza- • 

the problem of detecting mutations at one or more position tion reveals the optimal length and sequence composition of 
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probes lo detect aparticular mutation or other specific base substitution and any deletion within the 192-base exon, 
sequence of nucleotides. In some circumstances, i.e., target including the three-base deletion known as AF50S. As 
nucleic acids with repeated sequences or with high G/C described in detail below, hybridization of sub-nanomolar 
content, very long probes may be required for optimal concentrations of wild-type and AF508 oligonucleotide tar- 
detection. In one embodiment for detecting specific 5 nucleic acids labeled with, fluorescein to these arrays 
. sequences in a target nucleic acid with a DNA chip, repeat ^produces; highly , specific signals (detected with confocal 
.sequences are detected as follows. The chip comprises . sp^^g fluorescence microscopy) that permit discrimina^ 
probes of length sufficient to extend into the repeat region lion betwcen mutant and wild-type target sequences in both 
varying distances from each end. The sample, prior to ■ homozygous and heterozygous cases. The method and chips 
hybridization, is treated with a labeled oligonucleotide that 10 of ^ invCQll0D c ** ^so be used to detect other known 
is complementary to a repeat region but shorter than the full mu Jf U0QS m CFTR B 6 ?*. « described in detail below, 
length of the repeat. The target nucleic is labeled with a A ™Z *° St COm f mOD ^ fi brosis mutation is known as 

r 0b s tbat Rebound both the labeled target ^ ^ t mvcQtIoQ idcs DNA 

and the labeled ohgonucleo tide probe; the presence of such 15 5 ctccdng AF5 g 8> onc such chi * su]ts from ™ r 
bound probes shows that at least two repeat sequences are £ Ung mcthod t0 cxon 10 of t £ CFTR gene, the exon to 
present. ^ which AF508 has been mapped. The tiling method involved 

A variety of methods can be used to enhance detection of the synthesis of a set of probes of a selected length in the 
labeled targets bound to a probe on the array. In one range of from 10 to 18 bases and complementary to subse- 
embodiment, the protein MutS (from E. colt) or equivalent 20 quenccs of the known wild- type CFTR sequence starting at 
proteins such as yeast MSH1, MSH2, and MSH3; mouse a position a few bases into the intron on the 5'-side of exon 
Rep-3, and Streptococcus Hex-A, is used in conjunction - 10 and ending a few bases into the intron on the 3*-side/ 
with target hybridization to detect probe-target complex that There was a probe for each possible subsequence of the 
contain mismatched base pairs. The protein, labeled directly given segment of the gene, and the probes were organized 
or indirectly, can be added to the chip during or after 25 into a "lane" in such a way that traversing the lane from the 
hybridization of target nucleic acid, and differentially binds upper left-hand comer of the chip to the lower righthand 
to homo- and heteroduplex nucleic acid. A wide variety of corner corresponded to traversing the gene segment base- 
dyes and other labels can be used for similar purposes. For by-base from the 5*<nd. The lane containing that set of 
instance, the dye YOYO-1 is known to bind preferentially lo probes is, as noted above, called the "wild-type lane." 
nucleic acids containing sequences comprising runs of 3 or . 30 Relative to the wild-type lane, a "substitution" lane, called 
more G residues, the "A-lane", was synthesized on the chip. The A-laric 

The DNA chips produced by the methods of the invention probes were identical in sequence to an adjacent 
can be used to study and detect mutations in exons of human . (immediately below the corresponding) wild -type probe but 
genes of clinical interest, including point mutations and contained, regardless of the sequence of the wild -type probe, 
deletions. In the following sections, the method of the 35 a dA residue at position 7 (counting from the 3-end). In 
invention is illustrated by the detection of mutations in a similar fashion, substitution lanes with replacement bases 
- variety of clinically and medically significant human nucleic dC, dG, and dT were placed onto the chip in a "C-Iane," a 
acid sequences. Thus, the invention is illustrated first with "G-Iane," and a "T-Iane,- respectively. A sixth lane on the 
respect to the preparation of DNA chips for the detection of chip consisted of probes identical to those in the wild-type 
mutations associated with cystic fibrosis, then with DNA 40 lane but for the deletion of the base in position 7 and 
chips for the detection of human mitochondrial DNA restoration of the original probe length by addition to the 
sequences, then with DNA chips for the detection of muta- 5'-end the base complementary to the gene at that position, 
tions in the human p53 gene associated with cancer, and The four substitution lanes enable one to deduce the 
finally with respect to the detection of mutations in the HIV sequence of a target exon 10 nucleic acid from the relative 
RT gene associated with drug resistance. 45 intensities with which the target hybridizes to the probes in 

Detection of Cystic Fibrosis Mutations with DNA Chips the various lanes. The probe organization on the chip can be 
A number of years ago, cystic fibrosis, the most common conveniently columnar, and the set of probes consisting of a 
severe autosomal recessive disorder in humans, was shown wild-type probe and four corresponding substitution probes 
to be associated with mutations in a gene thereafter named is referred to as a "column set." One and only onc of the four 
the Cystic Fibrosis Transmembrane Conductance Regulator 50 substitution probes in a column set has exactly the same 
(CFTR) gene. The sequences of the exons and parts of the . sequence as the wild-type probe in the set. Those of sfill in 
introns in the gene arc known, as arc the changes corre- the art will appreciate that, in other embodiments of the 
sponding to several hundred known mutations. Several tests invention, one could delete one or more lanes or columns 
have been developed for detecting the most frequent of these and stDl benefit from the invention. Various versions of such 
mutations. The present invention provides CFTR gene oli- 55 cxon 10 DNA chips were made as described above with 
gonucleotide arrays (DNA chips) that can be used to identify probes 15 bases long, as well as chips with probes 10, 14, 
mutations in the CFTR gene rapidly and efficiently. and 18 bases long. For the results described below, the 

The methods used to make the high-density DNA chips of probes were 15 bases long, and the position of substitution 
the invention allow probes for long stretches of DNA coding was 7 from the 3'-end. 

regions to be directly "written** onto the chips in the form of 60 To demonstrate the ability of the chip to distinguish the 
sets of overlapping oligonucleotides. These methods have AF508 mutation from the wild-type, two synthetic target 
been used to develop a number of useful CFTR gene chips, nucleic acids were made. The first, a 39-mer complementary 
one illustrative chip bears an array of 1296 probes covering to a subsequence of exon 10 of the CFTR gene having the 
the full length of cxon 10 of the CFTR gene arranged in a three bases involved in the AF503 mutation near its center, 
36x36arrayof356Xmelcments.Theprobcsinthearraycan 65 is called the "wild-type" or wt508 target, corresponds to 
have any length, preferably in the range of from 10 to 18 positions 111-149 of the exon, and has the sequence shown 
residues and can be used to detect and sequence any single- below; 
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5 -CATTAAAG AAAATATCATCTTTG GTGTTTCCTAT- whose point of substitution corresponds to the T at the 3'-end 
GATGA (SEQ. ID NO: 5). of the deletion was very close to background. Following that 

The second, a 36-mcr probe derived from the wild-type pattern, the wild-type probe whose point of substitution 
target by removing those same three bases, is called the corresponds to the middle base (also a T) of the deletion 
. "mutant" target or mu508 target and has the sequence shown 5 bound still less target. However, the probe in the T-lane of 
below, first with dashes to indicate the deleted bases, and . that column set bound the target very well. 
then without dashes but with one base underlined (to indi- : Examination of the sequences of the two targets reveals 
cate the base detected by the T-lane probe, as discussed that the deletion places an A at that 'position when the 
bc J 0W ) : . sequences are aligned at their 3'-ends and that the T-lane 

5 ' " C A ,T,-T A A AG AA AATATCAT-- - 10 probe is complementary to the mutant target with but two 
TG GTGTTTCCTATG ATG A; (SEQ. ID NO:6) mismatches near an end (shown below in lower-case letters 

5'-CAITAAAGAAAArATCAITGGTGTITCCTATGATGA with the position of substitution underlined): 

(SEQ. ID NO:7) Target: " 5 * - CAT TA A A G A A A ATAT C ATT G G TG T- 

Both targets were labeled with fluorescein at the 5'-end. TTCCTATGATGA 

In three separate experiments, the wild-type target, the is Probe: 3'-TagTAGTAACCACAA (SEQ. ID NO:8) 
mutant target, and an equimolar mixture of both targets was Thus the T-lane probe in that column set calls the correct 
exposed (0.1 nM wt508, 0.1 nM mu508, and 0.1 nM wt508 base from the mutant sequence. Note that, in the graph for 
plus 0.1 nM mu508, respectively, in a solution compatible the equimolar mixture of the two targets, that T-Ianc probe 
with nucleic acid hybridization) to a CF chip. The hybrid- binds almost as much target as does the A-lane probe in the 
ization mixture was incubated overnight at room 20 same column set, whereas in the other column sets, the 
temperature, and then the chip was scanned on a reader (a probes that do not have wild-type sequence do not bind 
confocal fluorescence microscope in photon-counting mode; target at- all as* well. Thus, that one column set, and iV 
images of the chip were constructed from the photon counts) particular the T-lane probe within that set, detects the AF50S 
at several successively higher temperatures while still in mutation under conditions that simulate the homozygous 
contact with the target solution. After each temperature 25 case and also conditions that simulate the heterozygous case, 
change, the chip was allowed to equilibrate for approxi- The present invention thus provides individual probes, 
mately one-half hour be fore being scanned. After each set of sets of probes, and arrays of probe sets on chips, in specific 
scans, the chip was exposed to denaturing solvent and patterns, as the probes provide important benefits for detect- 
conditions to wash, i.e., remove target that had bound, the ing the presence of specific exon 10 sequences. The 
chip so that the next experiment could be done with a clean 30 sequences of several important probes of the invention are 
CQJ "P- shown below. In each case, the letter "X" stands for the point 

The results of the experiments are shown in FIGS. 3, 4, 5, of substitution in a given column set, so each of the 
and 6. FIG. 3, in panels A, B, and C, shows an image made sequences actually represents four probes, with A, C, G, and 
from the region of a DNA chip containing CFTR exon 10 T, respectively, taking the place of the "X." Sets of shorter 
probes; in panel A, the chip was hybridized to a wild-type 35 probes derived from the sets shown below by removing up 
target; in panel C, the chip was hybridized to a mutant delta to five bases from the 5*-end of each probe and sets of longer 
508 target; and in panel B, the chip was hybridized to a probes made from this set by adding up to three bases from 
mixture of the wild-type and mutant targets. FIG. 4, in sheets the exon 10 sequence to the 5'-end of each probe, are also 
1-3, corresponding to panels A, B, and C of FIG. 3, shows useful and provided by the invention, 
graphs of fluorescence intensity versus tiling position. The 40 3'-TTTATAXTAGAAACC (SEQ. ID NO:9) 
labels on the horizontal axis show the bases in the wild-type 3-TTATAGXAGAAACCA (SEQ. ID NO:10) 
sequence corresponding to the position of substitution in the 3-TATAGTXGAAACCAC (SEQ. ID NO:ll) 
respective probes. Plotted are the intensities observed from 3-ATAGTAXAAACCACA (SEQ. ID NO;12) 
the features (or synthesis sites) containing wild-type probes, 3'-TAGTAGXAACCACAA (SEQ. ID NO:13) 
the features containing the substitution probes thatbound the 45 3-AGTAGAXACCACAAA (SEQ. ID NO:14) 
most target ("called"), and the feature containing the sub- 3-GTAGAAXCCACAAAG (SEQ. ID NO:15) 
stitution probes that bound the target with the second highest 3-TAGAAAXCACAAAGG (SEQ. ID NO:16) 
intensity of all the substitution probes ("2nd Highest"). 3-AGAAACXACAAAGGA (SEQ. ID NO: 17) 

These figures show that, for the wild-type target and the Although in this example the sequence could not be 
equimolar mixture of targets, the substitution probe with a 50 reliably deduced near the ends of the target, where jtherc is . 
nucleotide sequence identical to the corresponding wild- not enough overlap between target and probe to "allow 
type probe bound the most target, allowing for an unam- effective hybridization, and around the center of the target, 
biguous assignment of target sequence as shown by letters where hybridization was weak for some other reason, per- 
near the points on the curve. The target wt508 thus hybrid- haps high AT-content, the results show the method and the 
ized to the probes in the wild-type lane of the chip, although 55 probes of the invention can be used to detect the mutation of 
the strength of the hybridization varied from probe-to-probe, interest. The mutant target gave a pattern of hybridization 
probably due to differences in melting temperature. The that was very similar to that of the wt508 target at the ends, 
sequence of most of the target can thus be read directly from where the two share a common sequence, and very different 
the chip, by inference from the pattern of hybridization in in the middle, where the deletion is located. As ooe scans the 
the lanes of substitution probes (if the target hybridizes most 60 image from right to left, the intensity of hybridization of the 
intensely to the probe in the A-lane, then one infers that the target to the probes in the wild-type lane drops off much 
target has a T in the position of substitution, and so on). more rapidly near the center of the image for mu508 thaa for 
For the mutant target, the sequence could similarly be wt508; in addition, there is one probe in the T-lane that 
called on the 3-side of the deletion. However, the intensity hybridizes intensely with mu508 and hardly at all with 
of binding declined precipitously as the point of substitution 65 wt508. The results from the equimolar mixture of the two 
approached the site of the deletion from the 3-end of the targets, which represents the case one would encounter in 
target, so that the binding intensity on the wild-type probe testing a heterozygous individual for the mutation, are a 
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blend of the results for the separate targets, showing the 
power of the invention to distinguish a wild-type target 
sequence from one containing the AF508 mutation and to 
detect a mixture of the two sequences. 

Tberesults above clearly demonstrate how the DNA chips 5 
of the invention can be used to detect a deletion mutation, , \ 
AF508; another model system was used to show that the 
chips can also be used to detect a point mutation as well. One 
of the more frequent mutations in the CFTR gene is G480C, 
which involves the replacement of the G in position 46 of 10 
exon 10 by a T, resulting in the substitution of a cysteine for 
the glycine normally in position #480 of the CFTR protein. 
The model target sequences included the 21-mer probe 
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terns. The wild-type sequence could easily be read from the 
chip, but the probe that bound the mu480 target so well when 
only the mu480 target was present also bound it well when 
both the mutant and wild-type targets were present in a 
mixture, making the hybridization pattern easily distinguish- 
able from. that of the wild-type target alone. These results 
again show the power of the DNA chips of the invention to 
detect point mutations in both homo- and heterozygous 
individuals. 

To demonstrate clinical application of the DNA chips of 
the invention, the chips were used to study and detect 
mutations in nucleic acids from genomic samples. Genomic 
samples from a individual carrying only the wild-type gene 



wt4S0 to represent the wild-type sequence at positions r — , «-w ftCUC 

37-55 of exon 10: 5-CCTTCAG AG G GTAAAATTAAG is and an individual heterozygous for AF508 were amplified by 

(SEQ. ID NO:18) and the 21-m er probe mu480 to represent PCR using exon 10 primers containing the promoter for T7 

the mutant sequence: 5'-CCTTCAGAGTGTAAAAnAAG RNA polymerase. Illustrative primers of the invention are 

(SEQ. ID NO:19). soown oc i ow> 



Exoo Name Sequence 



10 
10 
10 

10,31 

11 

11 



CFI9-T7 TAArACGACTCACTATAGOCAGatgacctaataatgatgggttt (SEQ. CD. NO20) 

CR30c-T7 TAATACGACTt^CTATAGGGAGtagtgtgaagggtrcatatgc (SEQ. ID. NO:21) 

CRl0c-T3 CTXXGAATIAACCCrCACrAAAGGtaglgtgsagggitcatatg (SEQ. ID. NO:22) 

CFtlO-T7 TAAlACGACTCACTATAGGGAGagcatactaaaagtgactctc (SEQ. ID. S003) 

CFi31c-T7 TAATACGACTCACTATAGOGAGacatgaatgacatttacagaa (SEQ. ID. N024) 

CFI11C-T3 CGGAATTAACCCTCACTAAAGGacatgaatgacatttacagcaa (SEQ. ID. NOl25) 



In separate experiments, a DNA chip was hybridized to :*0 These primers can be used to amplify exon 10 or exon 11 
each of the targets wt48Q and mu480, respectively, and then . sequences; in another embodiment/multiplex PCR is 
scanned with a confocai microscope. FIG. 5, in panels A, B, employed, using two or more pairs of primers to amplify 
and C, shows an image made from the region of a DNA chip more than one exon at a time. 

containing CFTR exon 10 probes; in panel A, the chip was The product of amplification was then used as a template 
hybridized to the wt480 target; in panel C, the chip was 35 for the RNA polymerase, with fluoresceinated UTP present 
hybridized to the mu480 target; and in panel B, the chip was to label the RNA product. After sufficient RNA was made, 
hybridized to a mixture of the wild-type and mutant targets. it was fragmented and applied to an exon 10 DNA chip for 
FIG. 6, in sheets 1-3, corresponding to panels A, B, and C 15 minutes, after which the chip was washed with bybrid- 
of FIG. 5, shows graphs of fluorescence intensity versus ization buffer and scanned with the fluorescence micro- 
tiling position. The labels on the horizontal axis show the <o scope. A useful positive control included on many CF exon 
bases in the wild-type sequence corresponding to the posi- 10 chips is the 8-mer 3'-CGCCGCCG-5\ FIG. 7, in panels 
tion of substitution in the respective probes. Plotted arc the A and B, shows an image made from a region of a DNA chip 
intensities observed from the features (or synthesis sites) containing CFTR exon 10 probes; in panel A, the chip was 
containing wild-type probes, the features containing the hybridized to nucleic add derived from the genomic DNA of 
substitution probes that bound the most Urget ("called"), and *5 an individual with wild-type AF508 sequences; in panel B, 
the feature containing the substitution probes that bound the the target nucleic acid originated from a heterozygous (with 
target with the second highest intensity of all the substitution respect to the AF508 mutation) individual. FIG. 8, in sheets 
probes ("2nd Highest"). 1 ^d 2, corresponding to panels A and B of FIG.' 7, shows 

These figures show that the chip could be used to graphs of fluorescence intensity versus tiling position, 
sequence a 16-base stretch from the center of the target 50 These figures, show that the sequence of the wild-type 
wt480 and that discrimination against mismatches is quite RNA can be called for most of the bases near the mutation, 
good throughout the sequenced region. When the DNA chip In the case of the AF508 heterozygous carrier, one particular 
was exposed to the target mu480, only one probe in the probe, the same one that distinguished so clearly between 
portion of the chip shown bound the target well: the probe the wild-type and mutant oligonucleotide targets in the 
in the set of probes devoted to identifying the base at 55 model system described above, in the T-lane binds a large 
position 46 in exon 10 and that has an A in the position of amount of RNA, while the same probe binds little RNA from 
substitution and so is fully complementary to the central the wild-type individual. These results show that the DNA 
portion of the mutant Urget. All other probes in that region chips of the invention are capable of detecting the AF508 
of the chip have at least one mismatch with the mutant Urget mutation in a heterozygous carrier, 
and therefore bind much less of it. In spite of that fact, the 60 Thus, the present invention provides methods for synthc- 
sequence of mu480 for several positions to both sides of the sizing large numbers of oligonucleotide probes on a glass 
mutation can be read from the chip, albeit with much- substrate and unique probe sets in a defined array in which 
reduced intensities from those observed with the wild-type the probes are arranged in the array by the "tiling" method 
tar B ct * of the invention. The DNA chips produced by the method 

The results also show that, when the two targets were 65 can be used to detect mutations in particular sequences of a 
mixed together and exposed to the chip, the hybridization Urget nucleic add, such as genomic DNAor RNA produced 
pattern observed was a combination of the other two pat- from transcription of an amplified genomic DNA. These 
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chips can be used to detect both point mutations and small 
deletions. Moreover, the pattern of hybridization to the chip 
allows inferences to be drawn about the sequences of the 
mutant DNAs. 

For example, in the model system involving the cystic 
fibrosis. point mutation G480C, the A-lane probe /whose, 
position of substitution corresponds to the position of the 
mutation does not bind much wild-type target, because in the 
wild-type sequence, a G occupies that position. However, it 
binds mutant target very well, allowing one to infer correctly 
that the mutation involves a change of that G to a T. 
Similarly, in the case of the three -base deletion in cystic 
fibrosis known as AF508, the T-lane probe that binds mutant 
target so intensely is responding to the fact that the deletion 
has brought a CAT sequence into the position occupied by 
a CTT sequence in the wild-type target. The DNA chips of 
the invention can be used to detect and sequence not only 
known mutations in an organism's genome but also new 
mutations not previously characterized. The DNA chips and 
methods of the invention can also be used to detect specific 
sequences in other CFTR exons as well as other human 
genes for purposes of research and clinical genetic analysis, 
as demonstrated below. 

Detection of Specific Human Mitochondrial DNA 
Sequences with DNA Chips 

As noted above, the present invention provides DNA 
chips on which a known DNAsequence is represented as an 
array of overlapping oligonucleotides on a solid support. 
This set of oligonucleotides is used to probe a target nucleic 
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some applications to using a minimal set of oligonucleotides 
specific to the sequence of interest, rather than a set of all 
possible N-mers. Some of these advantages include: (i) each 
position in the array is highly informative, whether or not 
hybridization occurs; (ii) nonspecific hybridization "is mini- 
mized; (iii) it is straightforward to correlate, hybridization 
differences with sequence differences, particularly with ref- 
erence to the hybridization pattern of a known standard; and 
(iv) the ability to address each probe independently during 
synthesis, using high resolution photolithography, allows the 
array to be designed and optimized for any sequence. For 
example the length of any probe can be varied independently 
of the others. 

The present invention illustrates these advantages by 
providing DNA chips and analytical methods for detecting 
specific sequences of human mitochondrial DNA In one 
preferred embodiment, the invention provides a DNA chip 
for analyzing sequences contained in a 13 kb fragment of 
human mitochondrial DNA from the "D-loop" region, the 
most polymorphic region of human mitochondrial DNA 
One such chip "comprises a set of 269 overlapping oligCH" 
nucleotide probes of varying length in the range of 9-+14 
nucleotides with varying overlaps arranged in -600x600 
micron features or synthesis sites in an array 1 cmxl cm in 
size. The probes on the chip are shown in columnar form 
below. An illustrative mitochondrial DNA chip of the inven- 
tion comprises the following probes (X, Y coordinates are 
shown, followed by the sequence; M DL3" represents the 



add comprising the known sequence, allowing mutations to 30 3 f -end of the probe, which is covalently attached to the chip 
be detected. As also noted above, there are advantages in surface.) 
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DL3AGTGGGGTATTT (SEQ ID. KO-26) 

DL3GGGTMnTAGTT (SEQ ID. NO-.27) 

DL3TTAGTITATCCAA (SEQ ID. N02S) 

DUATCCAAACCAGG (SEQ ID. NO:29) 

DL3ACCAGGATCGCA (SEQ ID, NOJ0) 

DL3CGTGTGTCTGTGG (SEQ ID. NO 31) 

DL3CGTGTGTGTGTGGC (SEQ ID. NO:32) 

DUTCGTGTGTGTGTGG (SEQ ID. N033) 



DL3GTAGGATGGGTC 
DUAGGATGGGTCGT 
DL3GATGGGTCGTGT 
DUTGGCGACGATTG 
D L3 GCG ACG ATTGGG 
DUTGGGGGGGA 
DL3GAGGGGGOG 
DL3GGAGGGGGCGA 
DL3GAGGGGGCGA 
DL3GGCTTGGTTGG 
DL3GGTTGU1 nGGG 
D L3TCGCGTTTCTAG 
DL3G 1 11 CTAGTGGG 
DL3AGTGGGGGGTGT 
DL3GGGGTGTCAAAT 
DUGTCAAATACATCG 
DL3ACATCGAATGGAG 
D L3CGAATGG AGG AG 
DL3GAGOAGTTTCGT 
D UTTTTCGTTATGTG A 
DL3 ATGTGAC" ill "1 AC 
DL3G AC! 1 1 1ACAAAT 
DUAAATCTGCOCGA 
D L3 AATCTGCCCG AG 
DL3CCCGAGTGTAGT 
DL3AGTCTAGTGGGG 
D L3 GGG AGGGTG AG 
D L3 GGTG AGGGTAIG 
DL3GGTATGATCATTAG 
DL3GATTAGAGTAAGT 
D L3TTAG AGTAAGTTA 



(SEQ ID. NO 34) 
(SEQ ID. NO-J5) 
(SEQ ID. N036) 
(SEQ ID. NO J7) 
(SEQ ID. N038) 



(SEQ ID. N039) 
(SEQ ID. NO:40) 
(SEQ ID. NO:41) 
(SEQ ID. NO:42) 
(SEQ ID. NO:43) 
(SEQ ID. N0:44) 
(SEQ ID. NO:45) 
(SEQ ID. NO:46) 
(SEQ ID. NO:47) 
(SEQ ID. NO:48) 
(SEQ ID. NO:49) 
(SEQ ID. NO30) 
(SEQ ID. NO:5l) 
(SEQ ID. N032) 
(SEQ ID. NO:53) 
(SEQ ID. NO*4) 
(SEQ ID. NO:55) 
(SEQ ID. NO J6) 
(SEQ ID. NO:57) 
(SEQ ID. N058) 
(SEQ ID. NO .59) 
(SEQ ID. NO:60) 
(SEQ ID. NO:61) 
(SEQ ID. NO:62) 



9 2 D L3GGTAGG ATGGGT 

10 2 DL3GGATGGGTCGTG 

11 2 DL3GCTCGTGTGTGT 

12 2 DL3GTGTGTGTGGCG 

13 2 DUTGTGGCGACGAT 

14 2 DL3GACGATTGGGGT 

15 2 DL3ATTGGGGTATGG 

16 2 DL3GTATGGCGCTTG 

0 3 DL3GGATTGTGGTCG 

1 3 DL3TGGTCGGATTGG 

2 3 D LXH3 ATTGGTCTAAA 

3 3 D L3TCTAAAGTTTAAA 

4 3 DL3GTTIAAAATAGAA 

5 3 D L3 ATAG AAAAACCG 

6 3 DUAGAAAAACCGC 

7 3 DUAACCGCCATAC 

8 3 D L3CCATACGTG AAAA 

9 3 D U ACGTO AAAATTGT 

10 3 DL3AATTGTCAGTGGG 

11 3 DL3TGTCAGTGGGGG 

12 3 DUTGGGGTTCA 

13 3 D L3GGGTTG ATTGTGT 

14 3 DLJTTGTGTAATAAAA 

15 3 DL3AATAAAAGGGGA 

16 3 DL3TAAAAGGGGAGG 

0 4 DL301 1 1 1 11AAAGG 

1 4 DUliUAAAGGTGC 

2 4 DUAGGTGGTTTGG 

3 4 DL3TTGGGGGGGAG 

4 4 DL3GGAGGGGGCG 

5 4 DL3GGGGCGAAGAC 

6 4 D L3GAAG ACCGG ATG 

7 4 DL3CCGGATGTCGTO 

8 4 DUGTCGTGAAi UU1 

9 4 D L3CGTG AATTTGTGT 

10 4 DLJTTGTGTAGAGACG 

1 1 4 DL3TAGAGACGGTTT 

12 4 DL3ACGGTTTGGGG 

13 4 DL3TGGG O! 11 H OT 

14 4 DL3GGG 11111 GTTT 



(SEQ ID. NO:67) 
(SEQ ID. NO:68) 
(SEQ ID. NO:69) 
(SEQ ID. NO:70) 
(SEQ ID. NO:7l) 
(SEQ ID. NO:72) 
(SEQ ID. NO:73) 
(SEQ ID. NO:74) 
(SEQ ID. NO:75) 
(SEQ ID. NO:76) 
(SEQ ID. NO:77) 
(SEQ ID. NO:78) 
(SEQ ID. NO:79) 
(SEQ ID. NO:80) 
(SEQ ID. NO:8l) 
(SEQ ID. NO:82) 
(SEQ ID. NO:83) 
(SEQ ID. NO:84) 
(SEQ ID. NO:85) 
(SEQ ID. NO:S6) 
(SEQ ID. NO:87) 
(SEQ ID. NO:88) 
(SEQ ID. NO:89) 
(SEQ ID. N050) 
(SEQ ID. NOSl) 
(SEQ ID. NO:92) 
(SEQ ID. NO:93) 
(SEQ ID. N034) 
(SEQ ID. N055) 
(SEQ ID. NOS6) 
(SEQ ID. N057) 
(SEQ ID. NO^S) 
(SEQ ID. NO.-99) 
(SEQ ID. NO:100) 
(SEQ ID. NaiOl) 
(SEQ ID. NO:102) 
(SEQ ID. NO:103) 
(SEQ ID. NO:104) 
(SEQ ID. NO:105) 
(SEQ ID. NO:106) 
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5 2 DL3AAGTTATGTTGGG 

6 2 DUGTTCGGGGCG 

7 2. DL3GGGGGGGGTA . 

8 2 . DUGCGGGTAGGAT 

2 5 D U ACACAATTAATTAA 

3 5 DL3AATTAATTACGAA 

4 5 DUTACGAACATCCT3 

5 5 DL3ACGAACATCCTGT 

6 5 D L3TCCTGTATTATTA 

7 5 D L3GTATTATTATTGTT 

8 5 D L3 ATTGTTAAACTTA 

9 5 DUAAACTTACAGACG 

10 5 DUACAGACGTGTCG 

11 5 D L3 GTGTCGGTG AAA 

12 5 D L3 GTGAAAGGKjTGT 

13 5 DUGGTGTCTCTGTAG 

14 5 DUTGTGTCTGTAGTA 

15 5 D L3GTAGTATTGTTTT 

16 5 DL3AGTATT G1 1 11 IT 

0 6 D UCCTCGTGGGATA 

1 6 D L3TGGGATACAGCG 

2 6 DUGATACAGCGTCAT 

3 6 DL3GCGTCATAGACAG 

4 6 DL3AGACAGAAACTAA 

5 6 DL3CAGAAACTAAGGA 

6 6 D L3TAAGG ACGG AGT 

7 6 DL3GACGGAGTAGGA 

8 6 DL3 GTAGG ATAATAAA 

9 6 DL3TAATAAATAGCG 

10 6 DUATAGCGTAGGAT 

1 1 6 DUTAGCGTAGGATG 

12 6 DL3AGGATGCAAGTT 

13 6 D L3 ATGCAAGTTATAA 

14 6 DUGTTATAATGTCCG 

15 6 DUATCTCCGCITGT 

16 6 DUTCCGCnGTATG 

0 7 D L3GTG AGTGCCCTC 

1 7 DL3TGCCCTCGAGAG 
7 DL3CCTCGAGAGGTA 
7 DL3AGAGGTACGTAA 
7 DL3ACGTAAACCATA 
7 DL3ACCATAAAAGCAG 
7 DUAAAGCAGACCC 
7 D L3 AG ACCCCCCAT 
7 DUCCCCCATACGT 
7 DUCATACGTGCGCT 
7 DL3 GTGCGCTATCAG 
7 DL3 GCGCTATCAGTA 
7 DUTCAGTAACGCTC 
7 D L3GTAACGCTCTGC 
10 DL3AGTCTATCCCCA 
10 DL3AICCCCAGGGA 
10 DL3CAGGGAACTGGT 
10 DL3ACTGGTGGTAGG 
10 DUCTGGTGGTAGGA 
10 DUGTAGGAGGCACA 
10 DL3 GGCACATTTAGT 

10 DL3TTTAGTTAIAGGG 

11 DUAGGTTTACGGTG 
11 DL3TACGGTGGGOA 
11 DL3 GTGGGG AGTCG 
11 DL3GGGAGTGGGTGA 
II DUGGGTGATCCTATG 
11 D L3 CCTAJGGTTGTTT 
11 D L3 GGTTGTTTGG ATG 
11 DUGTTTGGATGGGT 
11 DUATGGGTGGGAAT 
11 D L3 GGG AATTGTCATG 
11 DL3GTCATGTATCATGT 
11 D L3TCATGTATTTCGG 
11 D L3TATTTCGGTAAA 
11 DL3TTCGGTAAATGG 
11 DL3GTAAATGGCATGT 
11 D L3 GCATGTAATCGTG 

11 D L3 GTAATCGTGTAAT 

12 DUGGGAGGGGTAC 
12 D U GGGTACG AATGT 
12 D L3 ACG AATGTTCGTT 
12 D L3TGTTCGTTC ATGT 
12 D 13 CGTTCATGTCGTT 



(SEQ ID. NO:63) 
(SEQ ID. NO:64) 
(SEQ ID. NO:65) 
(SEQ ID. NO:66) 
(SEQ ID. NO:ll I) 
(SEQ ID. NO:112) 
(SEQ ID. NO:113) 
(SEQ ID. NO:114) 
(SEQ ID. NO:115) 
(SEQ ID. NO:116) 
(SEQ ID. NO:117) 
(SEQ ID. NO:llS) 
(SEQ ID. NO:119) 
(SEQ ID. NO:120) 
(SEQ ID. NO:121) 
(SEQ ID. NO:122) 
(SEQ ID. NO:123) 
(SEQ ID. NO:124) 
(SEQ ID. NO:125) 
(SEQ ID. NO:126) 
(SEQ ID. NO:127) 
(SEQ ID. NO:123) 
(SEQ ID. NO:129) 
(SEQ ID. NO:130) 
(SEQ ID. NO:13l) 
(SEQ ID. N0:132) 
(SEQ ID. NO:133) 
(SEQ ID. NO:134) 
(SEQ ID. NO:135) 
(SEQ ID. N0:135) 
(SEQ ID. NO:137) 
(SEQ ID. NO;l3S) 
(SEQ ID. NO:139) 
(SEQ ID. NO:140) 
(SEQ ID. NO:Hl) 
(SEQ ID. NO:142) 
(SEQ ID. NO:143) 
(SEQ ID. NO:144) 
(SEQ ID. NO:145) 
(SEQ ID, NO:146) 
(SEQ ID. N0.147) 
(SEQ ID. NO:143) 
(SEQ ID. NO:149) 
(SEQ ID. NO:150) 
(SEQ ID. ts'OHSl) 
(SEQ ID. NO:152) 
(SEQ ID. tiO:l5}) 
(SEQ ID. NO:154) 
(SEQ ID. N0:155) 
(SEQ ID. NO:156) 
(SEQ ID. NO203) 
(SEQ ID. NO:204) 
(SEQ ID. NO205) 
(SEQ ID. NO:206) 
(SEQ ID. NO:2Q7) 
(SEQ ID. NO:203) 
(SEQ ID. NO209) 
(SEQ ID. NO:210) 
(SEQ ID. N02U) 
(SEQ ID. NO:212) 
(SEQ ID. NO:213) 
(SEQ ID. soau) 
(SEQ ID. NO.215) 
(SEQ ID. N0216) 
(SEQ ID. NO.217) 
(SEQ ID. NO:2l8) 
(SEQ ID. N0^19) 
(SEQ ID. NO:22G) 
(SEQ ID. NO:221) 
(SEQ ID. N0^22) 
(SEQ ID. NO;223) 
(SEQ ID. N0^24) 
(SEQ ID. NO:22J) 
(SEQ ID. NO-^26) 
(SEQ ID. NO:227) 
(SEQ ID. N0528) 
(SEQ ID. NO:229) 
(SEQ ID. NO:230) 
(SEQ ID. N0231) 
(SEQ ID. NOi32) 
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15 4 DLJTTGTTTCTTGGG 

16 4 D L3TC1TGGG ATTGTC 

0 5 D LJTGTATG AATG ATTT 

1 5 DUTGATTTCACACAA 

h *7 DucrcrcoGACcrc 

15 7 D L3G ACCTCGGCCT - 

16 7 D L3TCGGOCTCGTG 

0 8 DL3GATGAAGTCCCAG 

1 8 DUAGTCCCAGTATTT 

2 8 DL3GTATTTCGGAnT 

3 8 DUTCGGATTTATCG 

4 8 DUGAnTATCGGGT 

5 8 DL3ATCGGGTGTGCA- 

6 8 DL3TGTGCAAGGGGA 

7 8 DL3CAAGGGGAATTT 

8 8 D L3G AArTTATTCTGXr\ 

9 8 D L3TCT0TAGTGCTAC 

10 8 D L3GTAGTGCTACCT 

11 8 DUGCTACCTAGTAG 

12 8 DUCTAGTAGTCCAGA 

13 8 D L3TCCAG ATA9TGGG 

14 8 DL3AGATAGTGGGATA 

15 8 DUGGGATAATTGGT 

16 8 DL3TAATTGGTGAGTG 

0 9 DUTATAGGGCGTGf 

1 9 DL3GGGCGTGTTCTCA 

2 9 DUGTGTTCTCACGAT 

3 9 D UTCACGATG AG AGO 

4 9 DUATGAGAGGAGCG 

5 9 DL3AGGAGCGAGGC 

6 9 DUCGAGGCCCGG 

7 9 DUGCCCGGGTATT 

8 9 D UCGGGTATTGTG A 

9 9 DUGTGAACCCCCAT 

10 9 DL3CCCCATCGATTT 

11 9 D L3 ATCG ATTTCACTT 
.12 9 D L3TTTCACTTG ACAT 

13 9 DLnTGACATAGAGCT 

14 9 DUTAGAGCTGTAGAC 

15 9 D L3GTAG ACCAAGGA 

16 9 DUACCAAGGATGAAG 

0 10 DL3CGTGTAATGTCAG 

1 10 DUTGTCAGTTTAGGG 

2 10 D LJTCAGTTTAGGG A 

3 10 DL3TAGGGAAGAGCA 

4 10 DL3AAGAGCAGGGGT 

5 10 DUCAGGGGTACCTA 

6 10 D L3GGTACCTACTGG 

7 10 DUTACTOGGGGGA 

8 10 D UGGGGGAGTCTAT 
13 DL3CATGTA1 1 1'U GG 
13 DLTrnTGGGTTAGG 
13 D L3GGGTTAGG ATGT 
13 DL3GGATGTAGTTTTG 
13 DL3TGTAGTTTTGGG 

13 DL3TTTGGGGGAGG 

14 DUGGGTTCATAACTG 
14 DLJATAACTGAGTGGG 
14 D L3 AACTG AGTGGGT 
14 DUGTGGGTAGTTGT 
14 DUGTAGllGUGGC 
14 DL3GTTGGCGATACA 
14 D L3CGATACATAAAAG 
14 DUTAAAAGCATGTAA 
14 D L3GCATCTAATGACG 
14 DUATOAOGGTCGGT 
14 DUGTCGGTGGTACT 

14 D L3GGTACTTATAACA 

15 D L3TCG ATTCTAAGAT 
15 D L3TAAG ATTAAATTT 
15 DL3AAATTTGAATAAG 
15 DL3AATAAGAGACAAG 
15 DUAAGAGACAAGAAA (SEQ ID. N0^68) 
15 DL3AAGAAAGTACCC (SEQ ED. NO:269) 
15 DL3AAAGTAOCCCTT (SEQ ID. NO^70) 
15 DL3COCCITCGTCTA (SEQ ID. NO:271) 
15 DL3CTTOGTCTAAAC (SEQ ID. NO:272) 
15 D L3CTAAACCCATGG (SEQ ID. NO:273) 
15 DUAACCCATGGTGG (SEQ ID. N0^74) 
15 DLJTGGTGGGTTCAT (SEQ ID. NO:275) 



(SEQ ID, NO:107) 
(SEQ ED. NO:108) 
(SEQ ED. NO.-109) 
(SEQtD.NO.llO) 
(SEQ ID. N0.157) 
(SEQ ED. NO: 158) 
(SEQ ED. N0.159) 
(SEQ ID. NO:160) 
(SEQ ID. NO:16l) 
(SEQ ID. NO:162) 
(SEQ ID. NO:163) 
(SEQ ID. NO:164) 
(SEQ ID. NO:165) 
(SEQ ID. NO:166) 
(SEQ ID. NO:167) 
(SEQ ED. NO:16S) 
(SEQ ID. NO:169) 
(SEQ ED. NO:170) 
(SEQ ID. NO:17l) 
(SEQ ID. NO:172) 
(SEQ ID.NO.173) 
(SEQ ID. NO:174) 
(SEQ ID. NO:175) 
(SEQ ID. NO:176) 
(SEQ ID. NO:177) 
. (SEQ ID. NO:178; 
(SEQ ID. NO:179) 
(SEQ ID. NO:180) 
(SEQ ID. NO:18l) 
(SEQ ED. NO:182) 
(SEQ ID. NO:183) 
(SEQ ED. NO:184) 
(SEQ ID. NO:185) 
(SEQ ED. NO:186) 
(SEQ ID. NO:187) 
(SEQ ID. NO:18S) 
(SEQ ED. NO:189) . 
(SEQ ED. NO:190) 
(SEQ ID. NO:19l) 
(SEQ ID. NO:192) 
(SEQ ID. NO:193) 
(SEQ ID. NO:194) 
(SEQ ED. NO:195) 
(SEQ ID. NO:196) 
(SEQ ID. NO:197) 
(SEQ ID. NO:19S) 
(SEQ ED. NO:199) 
(SEQ ID. NO200) 
(SEQ ID. NO:201) 
(SEQ ID. NO202) 
(SEQ ED. NO:246) 
(SEQ ED. NO:247) 
(SEQ ED. NO:24S) 
(SEQ ED. N0^49) 
(SEQ ID. NO250) 
(SEQ ED. NO:25l) 
(SEQ ID. NO:252) 
(SEQ ED. NO:253) 
(SEQ ED. N0054) 
(SEQ ED. NO:255) 
(SEQ ED. NO:256) 
(SEQ ID. NO:257) 
(SEQ ED. NO:258) 
(SEQ ED. NO:259) 
(SEQ ED. NO:260) 
(SEQ ID. N0561) 
(SEQ ED. tfO-262) 
(SEQ ID. NO:263) 
(SEQ ED. N0^64) 
(SEQ ID. NO:265) 
(SEQ ID. NO:266) 
(SEQ ID. U0267) 
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-continued 



10 12 DL3GTOGTTAGTTGG 

11 12 DUTAGTTGOOAGTT 

12 12 DU GGAGTTG ATAGTG 

13 12 DUATAGTGTGTAGTT 
.14 12 DUGTFIAGTTGAOGT 

. 15 12 . D L3TG ACGTTGAGGT. 

16 12 D L3 CGTTG AGGTTTA 

5 13 DL3TATAACATGCCAT 

6 13 DL3AACATGCCATGGT 

7 13 D L3 CC ATGGTATTAT 

8 13 D L3 ATTTATG AACTGG 

9 13 D L3 AACTGGTGG A CAT 

10 13 D L3TGG ACATCATGTA 



(SEQ ID. S0033) 
(SEQ ID. NO-JJ4} 
(SEO ID. NO:235) 
(SEQ ID. 
(SEQ ID. NO:237) 
(SEQ ID. N023S) 
(SEQ ID. NO:239) 
(SEQ ID. NO:240) 
(SEQ ID. NO:241) 
(SEQ ID. NO:242) 
(SEQ ID. N0243) 
(SEQ ID. NO:244) 
(SEO ID. N0245) 



5 16 D L3TTGG AAAAA GGT 

6 16 D L3 AAAA GGTTCCTG 

7 16 D UGGTTCCTGTTTA 

8 16 DL3CCTGT7TAGTCTC 
-9 16 . D UTTAGTCTCTTTTT 

10 16 DUCTTTTTCAGAAAT 

11 16 D L3 AGAAATTG AGGTG 

12 16 D U AAATTG AGGTGGT 

13 16 DL3GGTGGTAATCGT 

14 16 DL3TAATCGTGGGTT 

15 16 DUGTGGGTTTCGAT 

16 16 D L3GGTTTCG ATTCT 



(SEQ ID. KO:276) 
(SEQ ID. N0277) 
(SEQ ED. N027S) 
(SEQ ID. NO:279) 
■ (SEQ ID. NO^SO) 
(SEQ ED. NO:23l) 
(SEQ ID. tioasi) 
(SEQ ID. N0283) 
(SEQ ED. NO:234) 
(SEQ ED. tiOaSS) 
(SEQ ID. NO:2S6) 
(SEQ ID. N02S7) 



No Probes were present in positions X, Y«0, 12 to X, Y-4, and in several cases, the differences were within noise levels, 
v ft 1 1 ; , v t a : / Z Kl 3; Xt Y " 0 ' 14 10 X ' Y " 4 ' 14; X ' Improvements can be realized by increasing the amount of 
e 1 c i ' ' Y " 0 ' 16 10 X ' Y " 4 * 16 ' ^ ltn ^ ovcrla P between probes and hence overall probe density 
of each of the probes on the chip was variable to minimize and, for duplex DNA targets, using a second set of probes 
differences m melting temperature and potential for cross- either on the same or a separate chip, corresponding to the 
hybridization. Each position in the sequence is represented 20 second strand of the target. FIG. 14, in sheets 1 and 27shows 
by at least one probe and most positions are represented by a plot of normalized intensities across rows 10 and 11 of the 
2 or more probes. As noted above, the amount of overlap array and a tabulation of the mutations detected 
c^o Cn u lhC ^nucleotides varies from probe to probe. FIG. 15 shows the discrimination between wild-type and 
WO. V shows the human mitochondrial genome; u O N n * the mutant hybrids obtained with this chip. The median of the 
H strand origin of repUcation, and arrows indicate the cloned 25 six normalized hybridization scores for each probe was 
™ ed s^ence. taken. The graph plots the ratio of the median score to the 

DNA was prepared from hair roots of six human donors normalized hybridization score versus mean counts. On this 
(mtl to mt6) and then amplified by PCR and cloned into graph, a ratio of 1.6 and mean counts above 50 yield no false 
M13; the resulting clones were sequenced using chain positives, and whfle it is clear that detection of some mutants 
terminators to verify that the desired specific sequences were. 30 can be improved, excellent discrimination is achieved, con- 
present. DNA from the sequenced M13 clones was amplified sidering the small size of the array. FIG. 16 illustrates' how 
by PCR, transcribed in vitro, and labeled with fluorescein- tnc identity of the base mismatch may influence the ability 
UTP using T3 RNA polymerase. The 1.3 kb RNA transcripts 10 discriminate mutant and wild-type sequences more than 
were fragmented and hybridized to the chip. The results ^ P ositioa of the mismatch within an oligonucleotide 
showed that each different individual had DNA that pro- JS P robc - nc mismatch position is expressed as % of probe 
duced a unique hybridization fingerprint on the chip and that lcDgth from the 3 '- cnd - ^ basc cbange is indicated on the 
the differences in the observed patterns could be correlated graph : Th ?t rcsul * sh ? w thc DNA chi P leases the 
with differences in the cloned genomic DNAsequcnce The ca P ac ? l y° f Ac sta ? dar( J reverse dot blot format by orders of 
results also demonstrated that very long sequences of a ™R! ^ ' C7Stc l d f g *f P°. wcr °? wat approach many fold 
target nucleic acid can be represented coi«£nS i a 40 ind . **/ ^ ™^ of Ihe mveotion are more efficient and 
crw.;fi„ ^ t „c • 1 " wumpreacnsivciy as a 4u casier t0 automate tnan g C i-bascd methods of nucleic acid 

specific set of overlapping oligonucleotides and that arrays ud mutation * ^ 

of such probe sets can be usefully applied to genetic analy- ^we advantages become more apparent as chips with 
* . more and more probes are employed. To illustrate, the 

I he sample nucleic acid was hybridized to the chip in a present invention provides a DNA chip for analyzing human 
solution composed of rjxSSPE, 0.1% Triton-X 100 for 60 45 mitochondrial DNA (mtDNA) that "tiles" through 648 
minutes at 15° C. The chip was then scanned by confocal nucleotides of human H strand mtDNA from positions 
scanning fluorescence microscopy. The individual features 16280 to 356. Thc probes in the array are 15 nucleotides in 
on the chip were 588x588 microns, but the lower left 5x5 length, and each position in the target sequence is rcpre- 
square features in the array did not contain probes. To seotedby a set of 4 probes (A, C, G,T substitutions), which 
quantitate the data, pixel counts were measured within each 50 differed from one another at position 7 from the ^-cnd. The 
synthesis site. Pixels represent 50x50 microns. The fluorcs- arrav consists of 13 blocks of 4x50 probes: each block scans 
ccncc intensity for each feature was scaled to a mean through 50 nucleotides of contiguous mtDNA sequence. The 
determined from 27 bright features. After scanning, the chip ^ocks are separated by blank rows. Thc 4 corner columns 
was stripped and rehybridized; all six samples were hybrid- contain control probes; there are a total of 2600 probes in a 
tzed to the same chip. FIG. 10 shows the image observed 55 JS* 1 "? 8 Cm squarc arca ( fcaturc )» and cach arca " 
from the mt4 sample on the DNA chip. FIG. 11 shows the j^ll?^?? 5 ; , ™ A ^ r^n 

image observed from the mtS sample on the DNA chip. FIG. fi * prepared by PCR amph- 

12 shows the predicted difference image between the mt4 2^ */ DNA SpamU ° g 

and DKS samples on the DNA chip bLd on mismatches IT* ( *VS? ^ 

Anderson et al., 1981 .Nature 290: 457-465, incorporated polymerase promoter sequences and in vitro transcription to 

herein by reference). FIG. 13 shows the actual difference produce fluorescein-UTP labeled RNA. The RNA was frag- 

unage observed. mcotcd aod hybridized to the oligonucleotide array in a 

The results show that, in almost all cases, mismatched solution composed of 6xSSPE, 0.1% Triton X-100 for 60 
probe/target hybrids resulted in lower fluorescence intensity 65 minutes at 18° C. Unhybridized material was washed away 

than perfectly matched hybrids. Nonetheless, some probes with buffer, and thc chip was scanned at 25 micron pixel 

detected mutations (or sped 6c sequences) better than others, resolution. 
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cJ^JtLf™^ 1 5 '*i° y Tu KC v- mg ° f ° DC Uiset bclwecn lhe P^ 1 " mulati0D * P 53 *° d functioning 

^^^^^°^J^' X ^ Mlvt6U ' ° f thC rCSUh{Dfi Pf0teil1 * F " rthe <™^ there are ptojeS 

,h/n nh! J ,1 ^ (l,C :' a / C mismatchcd ™ b looIdD g at ^ ««rmline inheritance of p53 mutations and the 

S! fl™ Signaled site) are in bold. FIG. 18 shows development of cancer. T*e present invention provide! 

foefluoresancc image produced by ^annmg the chip when 5 use DNA chips and methoos for such stud^ 

hybndized to this ^sample About 95% of the sequence could ' . In addition, the present invention also provides a diac- 

• f^«d correctly from only one strand of the original duplex nostic test kit and method and P 53 probes immobilized on a 

targe nucleic acid Although some probes did not provide DNA chip in an organized array. Currently available diae- 

exceUent discrimination and some probes did not appear to nostic tests for cancer typically have a sensitivity of about 

nybndtze to the target efficiently, excellent results were 10 50%. The present invention provides significant advantages 

achieved. The target sequence differed from the probe set at over such tests, and in one embodiment provides a method 

six positions: 4 transitions and 2 insertions. All 4 transitions for detecting cancer-causing mutations in p53 that involves 

™Z A^tu Sp€Ci ? C pr °. bcS COUld rCadily be lhe ste P s of -< l > oblainiQ g a bl * 0 Psy, which is optionaUy 

pora ted into the array to detect insertions or deletions. FIG. fractionated by cryostat sectioning to enrich tumor cells to 
IV illustrates the detection of 4 transitions in the target 15 about 80% of the total cell population. The DNA or RNA is 
sequence relative to the wild-type probes on the chip. then extracted, amplified, and analyzed with a DNA chip for 

• * ' ™ A l ? at l0DgCr sc< i ucnccs can ^ read the presence of p53 mutations correlated with malignancy 

using tne DNA chips and methods of the invention, as To illustrate the value of the DNA chips of the present 
compared to conventional sequencing methods, where read- invention in such a method, a DNA chip was synthesized by 
wg length is Wed by the resolution of gel electrophoresis. 20 the VLSIPS™ method to provide an array of overlapping 
Similar results were observed when genomic DNA samples probes which represent or tile across a 60 base region of 
were prepared fromhurnao hair roots. Hybridization and exon 6 oFlhe p53 gene. To demonstrate the ability to detect" 
signal detection require less than an hour and can be readily substitution mutations in the target, twelve different single 
shortened by appropriate choice of buffers, temperatures, substitution mutations (wild type and three different substi- 
prooes, and reagents. In principle, longer sequence reads can 25 tutions at each of three positions) were represented on the 
be obtained than by conventional sequencing, where reading chip along with the wild type. Each of these mutations was 
length is limited by the resolution of gel electrophoresis. represented by a series of twelve 12-mer ob'gonucleotide 
P53 Sequencing and Diagnostic DNA Chips probes, which were complementary to the wild type target 

K>3 is a tumor suppressor gene that has been found to be except at the one substituted base. Each of the twelve probes 
mutated in most forms of cancer (see Levine et al, 1991, 30 was complementary to a different region of the target and 
w '1c i « 455 * 456 ' and HolIstcil1 ct *l-> 1991, Science contained the mutated base at a different position, e.g., if the 
253: 49-53, each , of which is incorporated herein by substitution was at base 32, the set of probes would be 
reference). In addition, there is a hereditary syndrome, complementary-with the exception of base 3?— to regions 
Li-Fraumem, m which individuals inherit mutant alleles of of the target 21-32, 22-33, and 32-43). This enabled inves- 
p53 and tend to have cancer at relatively young ages 35 ligation of the effect of the substitution position within the 
(Frebourg et al., 1992, PNAS 89: 6413^6417, incorporated probe. The alignment of some of the probes with a 12-mer 
herein by reference). During the development of a cancer, model target nucleic acid is shown in FIG 20 
p53 is inactivated. The course of p53 inactivation generally To demonstrate the effect of probe length, an additional 
involves ^ mutation in one copy of p53 and is often followed series of ten 10-mer probes was included for each mutation 
by deletion of the other copy. After p53 is inactivated, 40 (see FIG. 21). In the vicinity of the substituted positions, the 
chromosomal abnormalities begin to appear in rumors. In wild-type sequence was represented by every possible over- 
™ dcr5t00d form of cancc 0 colorectal cancer, well lapping 12-mer and 10-mer probe. To simplify comparisons 
over 50%, perhaps 80%, of all patients with tumors have p53 the probes corresponding to each varied position were 
mutations. In addition, p53 mutations have been found in a arranged on the chip in the rectangular regions with the 
high proportion of lung, breast, and other tumors (Rodrigues 45 following structure: each row of cells represents one 
et al., 1990, PNAS 87: 7555-7559, incorporated herein by substitution, with the top row representing the wild type 
n ACC ° rdlDg l ° daU P rcscnted b y David Sidransky Each column contains probes complementary to the same 
(1992 San Diego Conference), over 400 mutations in p53 are region of the target, with probes complementary to the 
t^ 11 * ci 3 * ad of thc tatgct 00 mc Icft ^ P robcs complementary to 

n r u • 8CDC SpaDS 20 kbp 1011111113115 has 11 cxons ' 50 me 5 '' cnd of thc tar S ct on Ac right The difference between 
10 of which arc protein coding (see Tominaga ct al., 1992, two adjacent columns is a single base shift in the positicming 
Critical Reviews in Oncogenesis 3: 257-282, incorporated of the probes. Whenever possible, the series of 10-mer 
herein by reference). The gene produces a 53 kilodalton probes were placed in four rows immediately underneath 
phosphoprotein that regulates DNA replication. The protein and aligned with thc 4 rows of 12-mer probes for the same 
acts to halt replication at the Gl/S boundary in the cell cycle 55 mutation. 

and is believed to act as a "molecular policeman," shutting To provide model targets, 5* fluoresceinated 12-mers 
down replication when the DNA is damaged or blocking the containing all possible substitutions in the first position of 
reproduction of DNA viruses (see Lane, 1992, Nature 358: codon 192 were synthesized (see the starred position in the 
15-16, incorporated herein by reference). There is substan- target in FIG. 20). Solutions containing 10 nM target DNA 
Hal interest in the cancer research community in analyzing fio in fixSSPE, 0.25% Triton X-100 were hybridized to the chip 
p53 mutations. The NCI is currently funding contracts to at room temperature for several hours. While target nucleic 
characterize the p53 mutation spectra caused by various was hybridized to the chip, the fluorophores on thc chip were 
carcinogens. In addition, there are research projects which excited by light from an argon laser, and the chip was 
involve sequencing p53 from spontaneously arising tumors. scanned with an autofocusing confocal microscope; The 
A major resource in these studies is thc huge supply of 65 emitted signals were processed by a PC to produce an image 
biopsy material stored in paraffin blocks. Also, there are using image analysis software. By 1 to 3 hours, thc signal 
projects which are aimed at analyzing the relationship had reached a plateau; to remove thc hybridized' target and 
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allow hybridization to another target, the chip was stripped For sequencing, the p53 DNA can be cloned from the 
with 60% formamide, 2xSSPE at IT C. for 5 minutes. The sample or directly amplified from genomic DNA by PCR If 
washing buffer and temperature can vary, but the buffer genomic PCR is used, then the DNA can be diluted prior to 
typically contains 2-to-3xSSPE, 10-to-60$& formamide (one amplification so that a single copy of the gene is amplified. 
. .. can use multiple washes, increasing the formamide concen- 5 For diagnostic purposes, the genomic DNA can be isolated 
tration by 10% each wash, and scanning between washes to . from a tumor biopsy in which the tumor ceils may be the 
determine when the wash is complete), and optionally a majority population. As noted above, the proportion of 
small percentage of Triton X-100, and the temperature is tumor cells in a sample can be enriched by cryostat section- 
typically in the range of 15° to 18° C ing. DNA can also be isolated and amplified from tumor 

Very distinct patterns were observed after hybridization 10 samples stored in paraffin blocks. . 
with targets with 1 base substitutions and visualization with The p53 DNA in the sample can be amplified by PCR 
a coofocal microscope and software analysis, as shown in (although other amplification methods can be used) using 
FIG. 22. In general, the probes which form perfect matches 3-4 primer pairs generating amplicons of <3 kbp each, 
with the target retain the highest signal. For example, in the Illustrative primers of the invention for amplifying exon 5 of 
first image in Figure PC, the 12-mer probes that form perfect 15 the p53 gene are shown below (B is biotin; F is fluorescein) 
matches with the wild-type (WI) target are in the first row 5VB -CACTTGTG CCCTG ACTTTCAAC-3'(SEQ. ID 
(top). The 12-mer probes with single base mismatches are NO:288) 

located in the second, third, and fourth rows and have much S'-F-CACTTGTGCCCTGACTTTCAACO' 
lower signals. The data is also depicted graphically in FIG. 5'-ATGCAATTAACCCTCACTAAAG GG AG ACACTTG- 
23. On each graph, the X ordinate is the position of the probe 20 TGCCCTGACTTTCAAC-3'(SEQ. ID NO:289) (has 13 
in its row on the chip, and the Yordinate is the signal at that promoter) . 

probe site after hybridization. 5-B-G ACCCTGG GCAACCAGCCCTGTCGT-3'(S EQ ID' 

When a target with a different one base substitution is NO:290) 
hybridized the complementary set of probes has the highest 5-F-G ACCCTGG GCAACCAG CCCTGTCGT-3 1 
signal (see pictures 2, 3, and 4 in FIG. 22 and graphs 2, 3, 25 S'-TAATACGACTCACTAXAG G GAG G ACCCTG GG CA- 
and 4 in FIG. 23). In each case, the probe set with no ACCAGCCCTGTCGT-3'(SEQ. ID N0.291) (has T3 
mismatches with the target has the highest signals. Within a promoter) 

12-mer probe set, the signal was highest at position 6 or 7. After PCR amplification of the target (the amplified target is 
The graphs show that the signal difference between 12-mer called the "amplicon") one strand of the amplicon can then 
probes at the same X ordinate tended to be greatest at 30 be isolated, i.e., using a biotinylated primer that allows 
positions 5 and 8 when the target. and the complementary capture of the undesired strand on streptavidin beads. . 
probes formed 10 base pairs and 11 base pairs, respectively. Alternatively, asymmetric PCR can be used to generate a 
Because tumors often have both \VT and mutant p53 genes, single-stranded target. Another approach involves the gen- 
mixed target populations were also hybridized to the chip, as eration of single stranded RNA form the PCR product by 
shown in FIG. 24. When the hybridization solution consisted 35 incorporating a T7 or other RNA polymerase promoter in 
of a 1:1 mixture of WT 12-mer and a 12-mer with a one of the primers. The single-stranded material can option- 
substitution in position 7 of the target, the sets of probes that ally be fragmented to generate smaller nucleic acids with 
were perfectly matched to both targets showed higher sig- less significant secondary structure than longer nucleic 
nals than the other probe sets. acids. 

The hybridization efficiency of a 10-mer probe array as 40 In one such method, fragmentation is combined with 
compared to a 12-mer probe array was also compared. The labeling. To illustrate, degenerate 8-mers or other degenerate 
10-mer and 12-mer probe arrays gave comparable signals short oligonucleotides arc hybridized to the single-stranded 
(see graphs 1-4 in FIG. 23 and graphs 1-4 in FIG. 25). target material. In the next step, a DNA polymerase is added 
However, the 10-mer probe sets, which are in rows 5-8 (see with the four different dideoxynucleotides, each labeled with 
images in FIG. 22), seemed to be better in this model system 45 a different fluorophore. Fluorophore-labeled dideoxynucle- 
than the 12-mer probe sets at resolving one target from otide are available from a variety of commercial suppliers, 
another, consistent with the expectation that one base mis- such as ABI. Hybridized 8-mers are extended by a labeled' 
matches are more destabilizing for 10-mers than 12-mers. dideoxynucleotide. After an optional purification step, Le., 
Hybridization results within probe sets perfectly matched to . with a size exclusion column, the labeled 9-mers arc hybrid- 
target also followed the expectation that, the more matches 50 ized to the chip. Other methods of target fragmentation can 
the individual probe formed with the target, the higher the be employed. The single-stranded DNA can be fragmented 
signal. However, duplexes with two 3' dangles (sec FIG. 23, by partial degradation with a DNAsc or partial depurination 
position 6 in graphs 1-4) have about as much signal as the with acid. Labeling can be accomplished in a separate step, 
probes which are matched along their entire length (see FIG. i.e., fluorophore-labeled nucleotides are incorporated before 
23, position 7, in graphs 1-4). 55 the fragmentation step or a DNA binding fluorophore, such 

This illustrative model system shows that 12-mer targets as ethidium homodimer, is attached to the target after 
that differ by one base substitutions can be readily distin- fragmentation. 

guished from one another by the novel probe array provided In one embodiment, the DNA chip has an array of 10 4 to 
by the invention and that resolution of the different 12-mer 10 5 probes tiling across the protein coding regions of p53, 
targets was somewhat better with the 10-mer probe sets than 60 which comprise about 1200 bp; smaller arrays specific for 
with the 12-mer probe sets. The value of having several the 600 bp mutational hot spot region arc also useful. The 
overlapping probes hybridizing to a target demonstrates the probes overlap for N-2 to N-4 bases, where N is the length 
value of the multiple hybridization events that take place on of the probe in bases. N is typically 10 to 14 bases long, but 
a DNA chip of the invention. The results also demonstrate as will be seen below, probes 15 to 19 bases and longer are 
the feasibility of constructing a probe set to sequence the 65 also useful. Every possible single base substitution occur- 
ence 1.4 kbp protein coding region of p53 or alternatively ring one at a time is represented in the array. The number of 
the 0.6 kbp of exons 5-9 containing mutation hot spots. unique 10-mer probes with 7 base overlaps would be about 
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(1200/3)x4xl0 or about 1.6x10". To allow 3 replicates of of DNA. First, the target DNA is amplified by PCR with 
each probe, odc might have a total array size on the order of primers allowing easy ligation into a vector, which is taken 
4.8x10 probes. Of course, arrays of probes within the up by transformation of E. colt which in turn must be 
ranges of 10* to 10* probes are also useful for applications; cultured, typically on plates overnight. After growth of the. 
for example, very large arrays of 10 a or more probes are s bacteria, DNA is purified in a procedure that typically takes 
; . useful for sequencing or sequence cheeking large genomic about 2 hours; then, the. sequencing reactions. are performed, 
DNAfragments. Optionally fragmented and labeled target which taies at least another hour, and the samples are run on 
nucleic acid hybridized to the chip is detected by a confocal the 8 el for several hours, the duration depending on the 
microscope or other imaging device. The pattern of sites ,eag,b ° f 106 & 4 S°ieo< be sequenced. By contrast, the 
"lighting up" with target is preferably analyzed with com- 10 f^ 0 ' P""*"? direct of we PCR ampli- 

puter assistance to provide the sequence of the target from fied a ~ J after br ! ef ^"P^o "d fragmentation 
the pattern of sites producing signals. s,e f* ?' yS ,* T aD . d hbor - 

The invention is illustrated below with examples of DNA - ?" mtenstm g dmical application for the characterization 

chips comprising very large arrays of DNAprobes to "rese- ? fbe . , ' f0 ? yg0 . u 1 s n"* 0 ™ wtl > DNA chips is as follows, 
quence" P S3 target nucleic acid in a sample. To analyze is Individuals with germbne cancer mutations have a very high 

DNA from exon 5 of the p53 tumor suppressor gene, a set ™* f °!J5f C0 " d f. rv ,umors f 11 " "eatment °y irradiation, 

of overlapping 17-mer probes was synthesized on a chip. Abo " 10 * of 3,1 J™ 6 " paUents n,av bave germline «nuu- 

The probes for the WT allele were synthesized so as to tie * 0a t- P ° f ' tumor , s yPP ressor g««- Thus, before 

across the entire exon with single base overlaps between dec .' d * g 00 a if"* 1 "! modaUt * 4 .P hvsi f ian could «* «»» 

probes. For each WT probe, a sets of 4 additional probes, 20 mcth< ? ud DNA chlps of lnventl0n «° fc st for a 

one for each possible base substitution at position 7, were g ^ b " ^T,- « en , e 1 mutatl0n -. w 

synthesized and placed in a column relative to the WTprobe. Dt £ °W* ipc . RaUo . nal Therapeutic Management ^ 

Exon 5 DNA was amplified by PCR with primers flanking u ^ present invention also provides DNA chips that can 

the exon. One of the primers was labeled with fluorescein' be by P h yf lcii « » determine optimum therapeutic 

the other primer was labeled with biotin. After amplification, 25 P"? 0001 * bv ra P ld detection of biologically mediated 

the biotinylated strand was removed by binding to strepta- I! s,s f ance L te a ! bera f e " bc age . nt m 8 varietv of disease sUtes - 

vidin beads. The fluoresceinated strand was used in hybrid- P, 6 be J K ^ ° f SUCb DNA f 1 "?? ■« fflaQ y. as the chips will 

Nation, help physicians recognize health care cost savings, achieve 

About 'A of the amplified, single-stranded nucleic acid , rapid ^P*" 1 .' 0 beM ^- u ' mit . administration of ineffective 
was hybridized overnight in 5xSSPE at 60° C. to the probe 30 < du ° ,0 ,he re , sistance ) yet lone drags, monitor changes in 
chip (under a cover slip). After washing with 6xSSPE, the '. pa,hogen r « istanc e. "d decrease pathogen acquisition of 
chip wasscanned using confocal microscopy. FIG. 26 shows . ^^e. Important appucatiohs_mcli.de the treatment of • 
an image of the p53 chip hybridized to L target DNA. Brother mfecuous diseases, and cancer. 
Analysis of the intensity data showed that 93.5% of the 184 ™Y b * mfected a expandmg number of people, 

bases of exon 5 were called in agreement with the WT 3S resu '^f m mlss, . ve heaItb care expenditures. HIV can 
sequence (see Buchman et si, 1988, Gene 70: 245-252 "P'dly become resistant to drugs used to treat the infection, 
incorporated herein by reference). The miscalled bases were l^'J^w"* 0 * ° f , he . tefodim 3 e ii c *«em <" 
from positions where probe signal intensities were tied *° HIV reverse transcriptase (RT) encoded by 

(1.6*) and where non-WT probes had the highest signal be " kb f? 1 g T|- ,' 8 eROr T £~1 P er roui ! d ). of 
intensity (4.9%). FIG. 27 illustrates how the actual sequence 40 * c "^ v V* 0 "" i h ^ y i^r rll !.". t A b,lit ^ 

was read. Gaps in the sequence of letters in the WT rows ^ n , ucIe °! 1 ,de »*- ^ ddI « ddC « and 

correspond to control probes or sites. Positions at which d4T f "»™»»y » •«« HIV infectionare converted to 
bases are miscalled are represented by letters in italic type in n " d<ot,d< an f al .°f <? by s f, < ' ueD, ' al Phosphorylation m the 
cells corresponding to probes in which the WT base/have <**Pl»» ° f ***** wb f". "corporation of the 
been substituted by other bases. as ana . ,ogu . e ut0 lhe waI DNA results 10 termination of viral 

As the diagram indicates, the miscalled bases are from the repb «" on . b « au * ^ phosphodiester linkage can- 
low intensity areas of the image, which may be due to n °' be H °* eV "' * " 6 ° oa,hs ,0 1 * c " 
secondary structure in the target or probes preventing inter- C J treatm . ent ' H Ff aUy nu,a,es ,be m genc 50 " ,0 
molecular hybridization. To diminish the effects due to be ~ Ble mca P ibIe of incorporating the analogue and so 
secondary structure, one can employ shorter targets (i.e., by SO ,0 f tK «™ at - Sevefal ^ ™™°<* irt-shown 
target fragmentation) or use more stringent bybridizatioo ,n UDUlar Ionn Delow - 

conditions. In addition, the use of a set of probes synthesized ■ 

by tiling across the other strand of a duplex target can also ft MtnxnoNS associated wrm drug resistance 

provide sequence information buned in secondary structure " — "~ ~ " ~ ~ ~ ~ ^— — ^ 

in the other strand. It should be appreciated, however, that 55 ANTt- 

the pattern of low intensity areas that forms as a result of VIRAL CODON n CHANGE m chance 
secondary structure in the target itself provides a means to 
identify that a specific target sequence is present in a sample. 
Other factors that may contribute to lower signal intensities 
include differences in probe densities and hybridization 60 
stabilities. 

These results demonstrate the advantages provided by the 
DNA chips of the invention to genetic analysis. As another 

example, heterozygous mutations are curreotly sequenced .. „ „. ~~. ! T " ~~~! ! '. 

k „ „„, . . j -e .. N.B. o«ier nutttioai eoafer nxaaaa to other drugs in vilro 

by an arduous process involving cloning and repunncation 65 

of DNA. The cloning step is required, because the gel The present invention provides DNA chips for detecting 
sequencing systems are poor at resolving even a 1:1 mixture the multiple mutations in the Hi V RT gene associated with 
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resistance to different therapeutics. These DNA chips will 
enable physicians to monitor mutations over time and to 
change therapeutics if resistance develops. The DNA chip 
will provide redundant confirmation of conserved HIV RT 
and other gene sequences, and the probes on the chip will tile 
. through, with overlap, in important mutational hot spot, 
regions. The chip wilt optionally have probes that span the 
entire coding region of the RT and optionally the genes for 
other HIV proteins, such as coat proteins. HIV target nucleic 
add can be isolated from blood samples (peripheral blood 
lymphocytes or PBMQ and amplified by PCR, primers for 
which are shown in the table below. 
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to gain primary structure information of the DNA target. 
This format has important applications in sequencing by 
hybridization, DNA diagnostics and in elucidating the ther- 
modynamic parameters affecting nucleic acid recognition. 
5 . Conventional DNA sequencing technology is a laborious 
. procedure requiring electrophorctic size separation of , 
;: labeled DNA fragments. An alternative approach, termed 
Sequencing By Hybridization (SBH), has been proposed 
(LysovetaL, 1988 t Dokl.Akad. NaukSSSR 303: 1508-1511; 
10 Bains et al., 1988, /. Theor. Biol. 135: 303-307; and 
. Drraanac et al., 1989, Genomics 4: 114-128, incorporated 
herein by reference). This method uses a set of short 



AMPLIFICATION OF TARGET 



TARGET 

SIZE PRIMER 1 



PRIMER 2 



1, 742bp GTAGAATTCTGTTGACTCAGAXTGG 

(SEQ ID. NO:292) 
535bp AAATCCATACAATACTCCAGTATTTGC 

(SEQ ID. NO:293) 
323bp Geebaak#K02013 1859-1903 



GATAAGCITGGGCCTTATCrATTCCAT 
(SEQ ID. NO:294) 

ACCCATCCAAAGGAATGGAGGTICTTTC 
(SEQ ID. KO:295) 
bases 2211-2192 ' 
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The HI VKT gene chips of the invention, as well as the CF, oligonucleotide probes of defined sequence to search for 
mtDNA, and p53 DNA chips of the invention, illustrate the complementary sequences on a longer target strand of DNA. 
, diverse application of the methods and probe arrays of; the The hybridization pattern is used to reconstruct the target 
invention. The examples that follow describe methods for DNA sequence. It is envisioned that hybridization analysis 
preparing nucleic acid targets from samples for application „ of large numbers of probes can be used to sequence long 
to the DNA chips of the invention and provide additional . stretches of DNA. In immediate applications of this hybrid- 
details of the methods of the invention.' ization methodology, a small number of probes can be used 

. to interrogate local DNA sequence. 
EXAMPLES jhc strategy of SBH can be illustrated by the following 

I. VLSIPS™ Technology example. A 12-mer target DNA sequence, 

As noted above, the VLSIPS™ technology is described in 35 AG CCTAGCTG AA, (SEQ. ID NO:296) is mixed with a 
a number of patent publications and is preferred for making complete set of octanucleotide probes. If only perfect 
the oligonucleotide arrays of the invention. For complementarity is considered, five of the 65,536 octamer 
completeness, a brief description of how this technology can probes -TCGGATCG, CGGATCGA, GGATCGAC, 
be used to make and screen DNA chips is provided in this GATCGACT, and ATCGACTT will hybridize to the target. 
Example and the accompanying Figures. In the VLSIPS 40 Alignment of the overlapping sequences from the hybridiz- 
method, light is shone through a mask to activate functional ing probes reconstructs the complement of the original 
(for oligonucleotides, typically an — OH) groups protected 12-mer target: 
with a photoremovable protecting group on a surface of a 

solid support. After light activation, a nucleoside building TCGGATCG 
block, itself protected with a photoremovable protecting 45 cggatcga 
group (at the 5'— OH), is coupled to the activated areas of ggatcgac 
the support. The process can be repeated, using different gatcgact 

masks or mask orientations and building blocks, to prepare 1 _ A Z c ^E T J r .^ m , x 

vcrydcnseamysofmanydifferentoUg^cleotideprobes. toggatcgacit (seq. id no^ 

The process is illustrated in FIG. 28; FIG. 29 illustrates how 50 ^ 
the process can be used to prepare "nucleoside combinato- Hybridization methodology can be carried put by attaching 
rials'* or oligonucleotides synthesized by coupling all four target DNAto a surface. The target is interrogated with a set 
nucleosides to form dimers, trimers, etc. of oligonucleotide probes, one at a time (see Strezoska ct al., 

New methods for the combinatorial chemical synthesis of 1991, Proc Natl Acad Sci. USA 88: 10089-10093, and 
peptide, polycarbamate, and oligonucleotide arrays have 55 Drmanac et al., 1993, Science 260: 1649-1652, each of 
recently been reported (see Fodor et al., 1991, Science 251: which is incorporated herein by reference). This approach 
767-773; Cho et al, 1993, Science 261: 1303-1305; and * can be implemented with well established methods of iramo- 
Southern et al., 1992, Genomics 13: 1008-10017, each of bilization and hybridization detection, but involves a large 
which is incorporated herein by reference). These arrays, or number of manipulations. For example, to probe a sequence 
biological chips (see Fodor ct al., 1993, Nature 364: 60 utilizing a full set of octanucleotides, tens of thousands, of 
555-556, incorporated herein by reference), harbor specific hybridization reactions must be performed. Alternatively, 
chemical compounds at precise locations in a high-density, SBH can be carried out by attaching probes to a surface in 
information rich format, and are a powerful tool for the an array format where the identity of the probes at each site 
study of biological recognition processes. A particularly is known. The target DNA is then added to the array of 
exciting application of the array technology is in the field of 65 probes. The hybridization pattern determined in a single 
DNAsequcace analysis. The hybridization pattern of a DNA experiment directly reveals the identity of all complemen- 
targct to an array of shorter oligonucleotide probes is used tary probes. ' 
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As noted above, a preferred method of oligonucleotide of the probes will generate detectable signals. Modifying the 
probe array synthesis involves the use of light to direct the above expression for N,*one arrives at a relationship esti- 
synthesis of oligo nucleotide probes in high-density, minia- mating the number of detectable hybridizations (Nd) for a 
turized arrays. Photolabile 5'-protected N-acyl- DNA target of length Lt and an array of complexity C. 
dcoxynuclcoside phosphoramidites, surface linker.. 5 Assuming an average of 5 positions giving signals above 
chemistry, and versatile combinatorial synthesis strategies \ background: Nd>*(l+5(C-l))[Lt-(Lp-l)]. 
- have been developed for this technology. Matrices of 1 . Arrays of btigbnucleqtides can be efficiently generated by 
spatially-defined oligonucleotide probes, have been light-directed synthesis and can be used to determine the 
generated, and the ability to use these arrays to identify identity of DNA target sequences. Because combinatorial 
complementary sequences has been demonstrated by JO strategies are used, the number of compounds increases 
hybridizing fluorescent labeled oligonucleotides to the DNA exponentially while the number of chemical coupling cycles 
chips produced by the methods. The hybridization pattern - increases only linearly. For example, expanding the synthe- 
demonstrates a high degree of base specificity and reveals sis to the complete set of 4 s (65,536) octanucleotides will 
the sequence of oligonucleotide targets. add only four hours to the synthesis for the 16 additional 

The basic strategy for light-directed oligonucleotide syn- 15 cycles. Furthermore, combinatorial synthesis strategies can 
thesis (1) is outlined in FIG. 28. The surface of a solid be implemented to generate arrays of any desired composi- 
support modified with photolabile protecting groups (X) is tion. For example, because the entire set of dodecamers (4 12 ) • 
illuminated through a photolithographic mask, yielding can be produced in 48 photolysis and coupling cycles (b n 
reactive hydroxyl groups in the illuminated regions. A compounds requires bxn cycles), any subset of the dodecam- 
3'-0-phosphoramidite activated deoxynucleoside (protected 20 ers (including any subset of shorter oligonucleotides) can be 
at the 5 -hydroxyl with a photolabile group) is then presented constructed with the correct lithographic mask design in 48 
to the surface and coupling occurs at sites that were exposed or fewer chemical coupling steps. In addition, the number of' 
to light. Following capping, and oxidation, the substrate is compounds in an array is limited only by the density of 
rinsed and the surface illuminated through a second mask, to synthesis sites and the overall array size. Recent experi- 
expose additional hydroxyl groups for coupling. A second 25 ments have demonstrated hybridization to probes synthe- 
5-protected, 3'-0-phosphoramidite activated deoxynucleo- sized in 25 /mi sites. At this resolution, the entire set of 
side is presented to the surface. The selective photodepro- 65,536 octanucleotides can be placed in an array measuring 
tection and coupling cycles are repeated until the desired set 0.64 cm square, and the set of 1,048,576 dodecanucleotides 
of products is obtained. requires only a 236 cm array. 

Light directed chemical synthesis lends itself to highly 30 Genome sequencing projects will ultimately be limited by 
efficient synthesis strategics which will generate a maximum DNA sequencing technologies. Current sequencing method- ; 
number of compounds in a minimum number ,of chemical ologies are highly reliant on complex procedures and require 
steps. For example, the complete set of 4n polynucleotides substantial manual effort. Sequencing by hybridization has 
(length n), or any subset of this set can be produced in only the potential for transforming many of the manual efforts 
4xn chemical steps. See FIG. 29. The patterns of illumina- 35 into more efficient and automated formats. Light-directed 
tion and the order of chemical re act ants ultimately define the synthesis is an efficient means for large scale production of 
products and their locations. Because photolithography is miniaturized arrays for SBH. The oligonucleotide arrays are 
used, the process can be miniaturized to generate high- not limited to primary sequencing applications. Because 
density arrays of oligonucleotide probes. For an example of single base changes cause multiple changes in the hybrid- 
the nomenclature useful for describing such arrays, an array 40 ization pattern, the oligonucleotide arrays provide a power- 
containing all possible octanucleotides of dA and dT is ful means to check the accuracy of previously elucidated 
written as (A+T} 8 . Expansion of this polynomial reveals the DNA sequence, or to scan for changes within a sequence. In 
id entity of all 2 56 octanucleotide probes from AAAAAAAA the case of octanucleotides, a single base change in the target 
to TTTTTTTT. A DNA array composed of complete sets of DNA results in the loss of eight complements, and generates 
dinucleotides is referred to as having a complexity of 2. The 45 eight new complements. Matching of hybridization patterns 
array given by (A+T+C+G)8 is the full 65,536 octanucle- may be useful in resolving sequencing ambiguities from 
otide array of complexity four. standard gel techniques, or for rapidly detecting DNA muta- 

To carry out hybridization of DNA targets to the probe tional events. The potentially very high information content 
arrays, the arrays are mounted in a thermostatically con- of light-directed oligonucleotide arrays will change genetic 
trolled hybridization chamber. Fluorescein, labeled DNA 50 diagnostic testing. Sequence comparisons of hundreds to 
targets are injected into the chamber and hybridization is thousands of different genes will be assayed simultaneously 
allowed to proceed for H to 2 hours. The surface of the instead of the current one, or few at a time format. Custom 
matrix is scanned in an epifktorescence microscope (Zeiss arrays can also be constructed to contain genetic markers for 
Axioscop 20) equipped with photon counting electronics the rapid identification of a wide variety of pathogenic 
using 50-100 of 488 nm excitation from an Argon ion 55 organisms. 

laser (Spectra Physics model 2020). All measurements are Oligonucleotide arrays can also be applied to study the 
acquired with the target solution in contact with the probe sequence specificity of RNA or protein-DNA interactions, 
matrix. Photon counts are stored and image files are pre- Experiments can be designed to elucidate specificity rules of 
sented after conversion to an eight bit image format. See non Watson-Crick oligonucleotide structures or to investi- 
FIG. 33. 60 gate the use of novel synthetic nucleoside analogs for 

When hybridizing a DNA target to an oligonucleotide antiscose or triple helix applications. Suitably protected 
array, N-Lt-(Lp-l) complementary hybrids are expected, RNA monomers may be employed for RNA synthesis. The 
where N is the number of hybrids, Lt is the length of the oligonucleotide arrays should find broad application deduc- 
DNA target, and Lp is the length of the oligonucleotide ing the thermodynamic and kinetic rules governing forma- 
probes on the array. For example, for an 11-mer hybridized 65 tion and stability of oligonucleotide complexes, 
to an octanucleotide array, N-4. Hybridizations with mis- Other than the use of photoremovable protecting groups, 
matches at positions that are 2 to 3 residues from either end the nucleoside coupling chemistry is very similar to that 
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used routinely today for oligonucleotide synthesis. FIG. 30 
shows the deprotection, coupling, and oxidation steps of a 
solid phase DNA synthesis method. FIG. 31 shows an 
illustrative synthesis route for the nucleoside building blocks 
used in the method. FIG. 32 shows a preferred photoremov- « 
able protecting group, McNPOC, and how to prepare the. ; 
, group in active form. The procedures described below show V ■ 
how to prepare these reagents. The nucleoside building 
blocks are 5'-MeNPOC-THYMIDINE-3'-OCEP; 
5'-MeNPOC-N 4 -t-BUTYL PHENOXYACETYL- 
DEOXYCYTIDINEO'-OCEP; 5'-MeNPOC-N 4 -t-BUTYL 
PHENOXYACETYL-DEOXYGUANOSINE-3'-OCEP; 
and 5^MeNPOC-N*-t-BUTYL PHENOXYACETYL- 
DEOXYADENOSINE-3'-OCEP. 

A. Preparation of 4, 5-methyIenedioxy-2-nitroacetophenone 
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minimum volume of CF^Clj or THF(~175 ml) and then 
precipitating it by slowly adding hexane (1000 ml) while 
stirriog (yield 51 g; 80% overall). It can also be recrystal- 
lizcd (eg., toluene-hexane), but this reduces the yield. 
C. Preparation of l-(4,5- ■methylenedioxy-2-nitrophenyl) 
ethyl chloroform ate (McNPOC-Gl) .";■■/• 
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A solution of 50 g (0.305 mole) 3,4- 
methylenedioxyacetophenone (Aldrich) in 200 mL glacial 
acetic acid was added dropwise over 30 minutes to 700 mL 
of cold (2-4° C) im HN0 3 with stirring (NOTE: the 
reaction will overheat without external cooling from an ice 



25 Phosgene (500 mLof 20% w/v in toluene from Fluka: 965 
mmole; 4 eq.) was added slowly to a cold, stirring solution 
of 50 g (237 mmole; 1 cq.) of l-(4,5-methylenedioxy-2- 
nitrophenyl)ethanoI in 400 mL dry THF. The solution was 
stirred overnight at ambient temperature at which point TLC 



bath, which can be dangerous and lead to side products). At 3 o (20% EuO/hexane) indicated >95% conversion. The 



temperatures below 0° C, however, the reaction can be 
sluggish. A temperature of 3°-5° C. seems to be optimal). 
The mixture was left stirring for another 60 minutes at 3°-5° 
C, and then allowed to approach ambient temperature. 
Analysis by TLC (25% EtOAc in hexane) indicated com- 
plete conversion of the starting material within 1-2 hr. When 
the reaction was complete, the mixture was poured into -3 
liters of crushed ice, and the resulting yellow solid was 
filtered off, washed with water and then suction-dried. Yield 
-53 g (84%), used without further purification. 
B. Preparation of l-(4,5-Methylenedioxy-2-nitropheny!) 
ethanol 




ture was evaporated (an ofl-Iess pump with downstream 
aqueous NaOH trap is recommended to remove the excess 
phosgene) to afford a viscous brown oil. Purification was 
effected by flash chromatography on a short (9x13 cm) 
35 column of silica gel eluted with 20% Ei 2 0/hexane. Typically 
55 g (85%) of the solid yellow MeNPOC-Cl is obtained by 
this procedure. The crude material has also been recrystal- 
lized in 2-3 crops from 1:1 ether/hexane. On this scale, -100 
ml is used for the first crop, with a few percent THF added 
40 to aid dissolution, and then cooling overnight at -20° C. (this 
procedure has not been optimized). The product should be 
stored dessicated at -20° C. 

D. Synthesis of 5'-MeNPOC-2'-DEOXYNUCLEOSIDE-3'- 
(N.N-DIISOPROPYL 2-CYANOETHYL PHOSPHORA- 
MIDITES 
(1) 5'-MeNPOC-Nucleosides 



Pyridine ^ 



Sodium borohydride (10 g; 0.27 mol) was added slowly 
to a cold, stirring suspension of 53 g (0.25 mol) of 4,5- 
raethylenedioxy-2-nitroacetophenone in 400 mL methanol. 
The temperature was kept below 10° C. by slow addition of 
the NaBH 4 and external cooling with an ice bath. Stirring 
was continued at ambient temperature for another two hours, 
at which time TLC (Ct^Cy indicated complete conversion 
of the ketone. The mixture was poured into one liter of 
ice-water and the resulting suspension was neutralized with 
ammonium chloride and then extracted three times with 400 
mL CH 2 CI 2 or EtOAc (the product can be collected by 
filtration and washed at this point, but it is somewhat soluble 
in water and this results in a yield of only -60%). The 
combined organic extracts were washed with brine, then 
dried with MgS0 4 and evaporated. The crude product was 
purified from the main byproduct by dissolving it in a 



MenpocO' 



55 




Base 



Base -THYMIDINE (T); N-4-ISOBUTYRYL 
60 2-DEOXYCYTIDINE (ibu-dQ; N-2-PHENOXYACETYL 
2'DEOXYGUANOSINE (PAC-dG); and N-6- 
PHENOXYACETYL 2'DEOXYADENOSINE (PAC-dA) 

All four of the 5'-MeNPOC nucleosides were prepared 
from the base-protected 2'-deoxy nucleosides by the follow-. 
65 ing procedure. The protected 2'-deoxynucleoside (90 
mmole) was dried by co-evaporating twice with 250 mL 
anhydrous pyridine. The nucleoside was then dissolved in 
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300 mL anhydrous pyridine (or 1:1 pyridine/DMF, for the 
dG . nucleoside) under argon and cooled to -2° C. in an 
ice bath. A solution of 24.6 g (90 mmole) MeNPOC-Cl in 
100 mL dry THP was then added with stirring over 30 
: minutes. The ice bath was removed, and the solution allowed 
to stir overnight at room temperature (TLC: 5-10% MeOH 
; in CH 2 C1 2 ; two diastereomers).: After evaporating the. sol- 
vents under vacuum, the crude material was taken up in 250 
mL ethyl acetate and extracted with saturated aqueous 
NaHC0 3 and brine. The organic phase was then dried over 
Na 2 S0 4f filtered and evaporated to obtain a yellow foam. 
The crude products were finally purified by flash chroma- 
tography (9x30 cm silica gel column elutcd with a stepped 
gradient of 2%~6% MeOH in CH 2 CU). Yields of the puri- 
fied diastereomeric mixtures are in the range of 65-75%. 

(2) S'-MeNPOC^'-DEOXYNUCLEOSIDE-S'-j^N- 
DIISOPROPYL 2-CYANOETHYL 
PHOSPHORAMIDITES) 
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Base 



"Amiditlngreageor «»v 
DIEA/DCM ^ 
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McnpocO* 




The four deoxynucleosides were phosphitylated using 
cither 2-cyanoethyl-N,N-diisopropyl 
cbtorophosphoramidite, or 2-cyanoethyl-N,N,N\N'- 
tetraisopropylphosphorodiamidite. The following is a typi- 
cal procedure. Add 16.6 g (17.4 ml; 55 mmole) of 
2-cyanoethyI-N,N^',N'-tetraisopropylphosphorodiamidite 
to a solution of 50 mmole 5-MeNPOC-nucleoside and 4.3 
g (25 mmole) diisopropylammonium tetrazolide in 250 mL 
dry CH 2 Cl 2 under argon at ambient temperature. Continue 
stirring for 4-16 hours (reaction monitored by TLC: 
45:45:10 bexane/CHXI^/EtjN). Wash the organic phase 
with saturated aqueous NaHC0 3 and brine, then dry over 
Na 2 S0 4 , and evaporate to dryness. Purify the crude amidite 
by flash chromatography (9x25 cm silica gel column cluted 
with hexane/CHjCL/TEA -45:45:10 for A, C, T; or 0:90:10 
for G). The yield of purified amidite is about 90%. 
II. PREPARATION OF LABELED 'DNA/ 
HYBRIDIZATION TO ARRAY 
1)PCR 

PCR amplification reactions are typically conducted in a 
mixture composed of per reaction: 1 p\ genomic DNA; 10 /d 
each primer (10 pmolVd stocks); 10 /d lOxPCR buffer (100 
mM Tris.CI pHS.5, 500 mM KC1, 15 mM MgClJ; 10 //1 2 
mM dNTPs (made from 100 mM dNTP stocks); 2.5 U Taq 
polymerase (Pcrkin Elmer AmpbTaq™, 5 U//d); and HjO to 
100 //I. the cycling conditions are usually 40 cycles (94* C. 
45 sec, 55° C. 30 sec, 72° C. 60 sec) but may need to be 
varied considerably from sample type to sample type. These 
conditions are for 0.2 mL thin wall tubes in a Perkin Elmer 
9600 thermocycler. See Pcrkin Elmer 1992/93 catalogue for 
9600 cycle time information. Target, primer length and 
sequence composition, among other factors, may also affect 
parameters. 



For products in the 200 to 1000 bp size range, check 2 /d 
of the reaction on a 15% 05xTBE agarose gel using an 
appropriate size standard (phiX174 cut with Haelll is 
convenient). The PCR reaction should yield several pico- 
moles of product. It is helpful to include a negative control 
(i.e., 1 jul TE instead of genomic DNA) to check for possible 
contamination. To avoid contamination, keep PCR products 
from previous experiments away from later reactions, using 
filter tips as appropriate. Using a set of working solutions 
and storing master solutions separately is helpful, so long as 
one does not contaminate the master stock solutions. 

For simple amplifications of short fragments from 
genomic DNA it is, in general, unnecessary to optimize 
Mg 2 * concentrations. A good procedure is the following: 
make a master mix minus enzyme; dispense the genomic 
DNA samples to individual tubes or reaction wells; add 
enzyme to the master mix; and mix and dispense the master 
solution to each well, using a new filter lip each time. 
2) PURIFICATION 

Removal of unincorporated nucleotides and primers from 
PCR samples can be accomplished using the Promega 
Magic PCR Preps DNA purification kit. One can purify the 
whole sample, following the instructions supplied with the' 
kit (proceed from section MB, 'Sample preparation for 
direct purification from PCR reactions'). After clution of the 
PCR product in 50 /d of TE or H 2 0, one centrifuges the 
eluate for 20 sec at 12,000 rpm in a microfuge and carefully 
transfers 45 /d to a new microfuge tube, avoiding any visible 
pellet. Resin is sometimes carried over during the elution 
step. This transfer prevents accidental contamination of the 
linear amplification reaction with 'Magic PCR' resin. Other 
. methods, e.g. size exclusion chromatography, may also be 
used.' - 

3) LINEAR AMPUFICATION 

In a 0.2 mL thin-wall PCR rube mix: 4 /d purified PCR 
35 product; 2 p\ primer (10 pmol//d); 4 /d lOxPCR buffer; 4 p\ 
dNTPs (2 mM dA, dC, dG, 0.1 mM rfl); 4/d 0. 1 mM dUTP; 
1 p\ 1 mM fluorescein dUTP (Amersham RPN 2121); 1 U 
Taq polymerase (Perkin Elmer, 5 U//d); and add H 2 0 to 40 
fil Conduct 40 cycles (92° C. 30 sec, 55° C. 30 sec, 72° C. 
90 sec) of PCR. These conditions have been used to amplify 
a 300 nucleotide mitochondrial DNA fragment but are 
generally applicable. Even in the absence of a visible 
product band on an agarose gel, there should still be enough 
product to give an easily detectable hybridization signal If 
one is not treating the DNA with uracil DNA glycosylase 
(see Section 4), dUTP can be omitted from the reaction. 

4) FRAGMENTATION 

Purify the linear amplification product using the Promega 
Magic PCR Preps DNA purification kit, as per Sictfon 2 
above. In a 0.2 mL thin-wall PCR tube mix: 40 /d purified 
labeled DNA; 4 ^1 lOxPCR buffer; and 0.5 fi\ uracil . DNA 
glycosylase (BRL lU//d). Incubate the mixture 15 min at 
37° C, then 10 min at 97° C; store at -20° C. until ready 
to use. 

5) HYBRIDIZATION SCANNING & STRIPPING 
A blank scan of the slide in hybridization buffer only is 

helpful to check that the slide is ready for use. The buffer is 
removed from the flow cell and replaced with 1 mL of 
(fragmented) DNA in hybridization buffer and mixed well. 
60 The scan is performed in the presence of the labeled target. 
FIG. 33 illustrates an illustrative detection system for scan- 
ning a DNA chip. A scries of scans at 30 min intervals using 
a hybridization temperature of 25° C. yields a very clear 
signal, usually in at least 30 min to two hours, but it may be 
desirable to hybridize longer, i.e., overnight. Using a laser 
power of 50 /xW and 50 pixels, one should obtain 
maximum counts in the range of hundreds to low thousands/ 
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pixel for a new slide. When finished, the slide can be 30 sec) arc performed, but cycling conditions may need to 
stripped using 50% fonnamide. rinsing well in deionized be varied. These conditions are for 0.2 mL thin wall tubes in 
fx * J^J}* and St0ring at room tc mpcrature. Pcrkin Elmer 9600. For products in the 200 to 1000 bo size 

■^w^toM*?*™' "■■"■■ 5 gd USiDg aa W ro P riate si2c slaDd *' d - larger or smaller 

, 1) TAGGEp PRIMERS^ - . . ..volumes (20-100 /d), one- can use the same amount of 

• i ne primers used to amplify the target nucleic add should genomic DNA but adjust the other ingredients accordingly 
have promoter sequences if one desires to produce RNA 4) IN VITRO TRANSCRIPTION 
from the amplified nucleic acid. Suitable promoter Mix: 3^1 PCR product; 4 /d Sxbuffer; 2/dDTT 2 4/d 10 
sequences are shown below and include: io mM rNTPs (100 mM solutions from Pharmacia); 6.48 /d 10 

i: ISll 13 £^ motCr sc£ l UCDCc: mM fluorescein-UTP (Fluorescein-12-UTP 10 mM 

5 -CGGAATTAACCCTCACTAAAGG (SEQ. ID NO:298) solution, from Boehringer Mannheim); 03 id RNA oolv- 
^^^ CCCrCA ^^OGGAG; (SEQ. ID N0:299) merase (Promega T3 or T7 RNA polymerase)^ add H,0 
£v;! JIZ 0 ?^* sec l ucncc: to 20/d. Incubate at 37° C. for 3 h. Check 2/d of the reaction 

5 J^CGACTCACrATAGGGAG; (SEQ. ID NO:300) 15 on a 15% O^xTBE agarose gel using a size sSd 
cf A ^%~l 6 J™ m0tCt t**™ 00 * 5xbuffer is 200 mM Tris pH 7.5, 30 mM MgCL, 10 mM 

^AnTAGGTGACACTATAGAA: (SEQ. ID NO:301) spermidine, 50 mM NaCl, and 100 mM DTT (supplied with 

The desired promoter sequence is added to the 5' end of the enzyme). The PCR product needs no purification and can be 
FLR primer. It is convenient to add a different promoter to added directly to the transcription mixture. A 20 /d reaction 
each primer of a PCR primer pair so that either strand may 20 is suggested for an initial test experiment and hybridization- 
be Uanscnbed from a single PCR product. a 100 /d reaction is considered "preparative" scale (the 
Synthesize PCR pnmers so as to leave the DMT group on. reaction can be scaled up to obtain more target). The amount" 
DMT-on purification is unnecessary for PCR but appears to of PCR product to add is variable; typically a PCR reaction 
be important for transcription. Add 25 /d 0.5M NaOH to will yield several picomoles of DNA. If the PCR reaction 
C u° m i°t Vld Pn ° r l ° coUcction of oligonucleotide to keep 25 does not produce that much target, then one should increase 
Uw DMT group on. Deprotect using standard chemistry— the amount of DNA added to the transcription reaction (as 
nor' £ VCm !f ht * C0 . avcnicnL well as optimize the PCR). Tne ratio of fluorescein-UTP to' 
HFLC purification is accomplished by drying down the UTP suggested above is 1:5, but ratios from 1:3 to 1*10— all 
oligonucleotides, resuspending in 1 mL 0.1M TEAA (dilute work well. One can also label with biotin-UTP and detect 
2.QM stock in deiomzed water, filter through 0.2 micron 30 with strep tavidin-FITC to obtain similar results as with 
filter) and filter through 0.2 micron, filter. Load 0.5 mL on . .fluorescein-UTP detection. 

reverse phase. HPLC (column can be a Hamilton PRP-1 For nondenaturing agarose gel electrophoresis of RNA, 
semi-prep, #79426). The gradient is 0-*50% CH 3 CN over note that the RNA band wfll normally migrate somewhat 
25 mm (program 0.2 //mol.prep.0-50, 25 min). Pool the faster than the DNA template band, although sometimes the 
desired fractions, dry down, resuspend in 200 /d 80% HAc. 35 two bands wfll comigrate. The temperature of the gel can 
30 mm RT. Add 200 /d EtOH; dry down. Resuspend in 200 effect the migration of the RNA band. Tne RNA produced 
//I H 2 0, plus 20 /d NaAc pH5.5, 600/d EtOH. Leave 10 min from in vitro transcription is quite stable and can be stored 
on ice; centrifuge 12,000 rpm for 10 min in microfuge. Pour for months (at least) at -20° C. without any evidence of 
off supernatant. Rinse pellet with 1 mLEtOH, dry, resuspend degradation. It can be stored in unstcrilized 6xSSPE 0 1% 
in 200 /d H20. Dry, resuspend in 200/d TE. Measure A260, 40 triton X- 100 at -20° C. for days (at least) and reused twice 
prepare a 10 pmol//d solution in TE (10 mM Tris.CI pH 8.0, (at least) for hybridization, without taking any special pre- 
0.1 mM EDTA). Following HPLC purification of a 42 mer, cautions in preparation or during use. RNase contamination 
a yield m the vicinity of 15 nmol from a 0.2 pmol scale should of course be avoided. When extracting RNA from 
S ^ h ^r^!^ a i* r ccIIs ' i{ fa P refcra °I c to work very rapidly and tp use strongly 

2) GENOMIC DNA PREPARATION 45 denaturing conditions. Avoid using glassware previously 

For obtaining genomic DNA from human hair, one can contaminated with RNases. Use of new disposable plas- 
extract as few as 5 hairs, including hair roots. On a clean and ticware (not necessarily sterilized) is preferred, as new 
sterile surface, one places the hair on a piece of parafilm, and plastic tubes, tips, etc., are essentially RNase free. Treatment 
after wiping a new razor blade with EtOH cutting off the . with DEPC or autoclaving is typically not unnecessary 
roots, the roots are transferred to a 1.5 mL microfuge tube 50 5) FRAGMENTATION 

using a pair of Millipore forceps cleaned with EtOH. Add In a 02 mL thin-wail PCR tube mix: 18 /d RNA (direct 
500/d (10 mM Tris.Cl pH8.0, 10 mM EDTA, 100 mM NaCl, from transcription reaction— no purification required); 18 id 
2% (w/v) SDS, 40 mM DTT, filter sterilized) to the sample. H 2 0; and 4/d 1M Tris.Cl pH9.0. Incubate at 99.9° C for 60 
Add 1.25 /d 20 mgfal proteinase K (Boehringer) Incubate at min. Add to 1 mL hybridization buffer and store at -20° C. 
55° C. for 2 hours, vortexing once or twice. Perform 2x0.5 55 until ready to use. Tne alkaline hydrolysis step is very 
mL 1:1 phenoLCHClj extractions. After each extraction, reliable. The hydrolysed target can be stored at -20° C. in 
centrifuge 12,000 rpm 5 min in a microfuge and recover 0.4 6xSSPE/0.1% Triton X-100 for at least several days prior to 
mL supernatant. Add 35 /d NaAc pH5.2 plus 1 mL EtOH. use and can also be reused. 

Place sample on ice 45 min; then centrifuge 12,000 rpm 30 6) HYBRIDIZATION SCANNING, & STRIPPING 
nun, rinse, air dry 30 min, and resuspend in 100 //I TE. 60 A blank scan of the slide in hybridization buffer only is 
3 ) P ^R t helpful to check that the slide is ready for use. The buffer is 

PCR is performed in a mixture containing, per reaction: 1 removed from the flow cell and replaced with 1 mL of 
fA genomic DNA; 4/d each primer (10 pmoI//d stocks); 4 /d .(hydrolysed) RNA in hybridization buffer and mixed well 
10 xPCR buffer (100 mM Tris.CI pH8.5, 500 mM KCI, 15 Incubate for 15-30 min at 18° C. Remove the hybridization 
mM MgClJ; 4 /d 2 mM dNTPs (made from 100 mM dNTP 65 solution, which can be saved for subsequent experiments 
stocks); 1 U Taq polymerase (Pcrldn Elmer, 5 U//d); H 2 0 to Rinse the flow cell 4-5 times with fresh changes of 6xSSPE/ 
40/il. About 40 cycles (94° C. 30 sec, 55° C. 30 sec, 72° C. 0.1% Triton X-100, equilibrated to 18° C. The rinses can be 
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performed rapidly, but it is important to empty the flow cell 
before each new rinse and to mix the liquid in the cell 
thoroughly. The scan is performed in the presence of the 
labeled target. A scries of scans at 30 min intervals using a 
. hybridization temperature of 25° C yields a very clear 
signal, usually in at least 30 min to two hours, but it may be 
desirable to hybridize longer, i.e., overnight. Using a laser 
power of 50 //W and 50 >*m pixels, one should obtain 
maximum counts in the range of hundreds to low thousands/ 
pixel for a new slide. When finished, the slide can be 
stripped using 50% to 100% formamide at 50° C. for 30 min, 
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rinsing well in deionized H 2 0, blowing dry, and storing at 
room temperature. 

These conditions axe illustrative and assume a probe 
length of -15 nucleotides. The stripping conditions sug- 
5 gested arc fairly. severe, but some signal may remain on the 
. slide if the washing is not stringent. Nevertheless, the counts 
. remaining after the wash should be very" low in comparison 
to the signal in presence of target RNAl In some cases, much 
gentler stripping conditions are effective. The lower the 
hybridization temperature and the longer the duration of 
10 hybridization, the more difficult it is to strip the slide. Longer 
targets may be more difficult to strip than shorter targets. 



SEQUENCE LISTING 



( I ) GENERAL INFORMATION: 

(III ) NUMBER OF SEQUENCES: 360 



( 2 ) INFORMATION FOR SEQ CD NChl: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: LS bax pin 
( B ) TYPE: oodeic add 
( C ) STRANDEDNESS: ilngfe 
( D ) TOPOLOGY: Ihcv 

( I I ) MOLECULE TYPE: DNA (probe) 

{ x I ) SEQUENCE DESCRIPTION: SEQ ID NOU: 

TTCCTCACGT CAGCC 



( 2 ) INFORMATION FOR SEQ ED XOsi 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: IS base pin 
( B ) TYPE: nadetc acid 
( C ) STRANDEDNESS: liable 
( D ) TOPOLOGY: liaeir 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

TTGCTGACAT CAGCC 



( 2 ) INFORMATION FOR SEQ CO XCO: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 ba* pin 
( B )TYPE; ondcicacid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-J: 

TTGCTGACCT CAGCC 



( 2 ) DEFORMATION FOR SEQ CD NO:4: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U base pin 
( B ) TYPE: nodck add 
( C ) STRANDEDNESS: alcjfc 
( D ) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 



( x I ) SEQUENCE DESCRIPTION; SEQ CD NO:4: 
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TTGCTOACTT CAGCC .; 

: ( 2 ) INFORMATION FOR SEQ CD NO:5: . 

( I ) SEQUENCE CHAfUCTCRtSTlCS: . • ■ ;"'V- 

( A ) LENGTH: 39 ba« pain ' 
( B ) TYPE: aocleie acid 
( C ) STRANDEDNESS; single 
(D)TOPOLOGY: linear 

( 1 ! ) MOLECULE TYPE: DNA (oligonucleotide) 

( x I ) SEQUENCE DESCRIPTION: SEQ £D N0:5: 

CAT TAAACAA AATATCATCT TTGGTGTTTC CTATGATGA 

( 2 ) INFORMATION FOR SEQ ED NO:6: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 36 base pain 
< B ) TYPE: nucleic icid 
C C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION; SEQ CD NO:6: 

CATTAAAGAA AATATCATTG GTCTTTCCTA TGATGA 

( 2 ) INFORMATION FOR SEQ ID NO:7; 

( J ) SEQUENCE CHARACTERISTICS: . 

( A ) LENGTH: 36 ba*e pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: jingle 
( D ) TOPOLOGY: linear 

( ! I ) MOLECULE TYPE: DNA (genomic) 

( x I ) SEQUENCE DESCRIPTION: SEQ CO NO:7: 

CATTAAAGAA AATATCATTG GTGTTTCCTA TGATGA 

( 2 ) INFORMATION FOR SEQ CD NO:* 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 base pairs 
(B)TYPE:nactc!c»cId 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:* 

AACACCAATO ATGAT 

( 2 ) INFORMATION FOR SEQ CD NO* 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 base pain 
( B) TYPE: nucleic »cfd 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOS: 

CCA AAG ATNA TATTT 



I 5 
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I S 
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( 2 ) INFORMATION FOR SEQ CD NOtlO: 
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( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 bu< pi En 
( B )TYPE:undcie Kid 
( C ) STRANDEDNESS: Jmgjc 
(D )TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE; DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ CD NO:10: 

ACCAAAGAMG ATA T T 



( 2 ) INFORMATION FOR SEQ ED NOtll: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: LSbaae pain 
( B ) TYPE: Bodcte acid 
( C ) STRANDEDNESS: jingle 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ID NOill: 

CACCAAAG.VT GATAT 



( 2 ) INFORMATION FOR SEQ ID NO:U: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 ba* pain 
( B ) TYPE: nodeic acid 

< C ) STRANDEDNESS: **m$te 

< D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 
( x I ) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
A C AC C A AANA TG ATA 



( 2 ) INFORMATION FOR SEQ ED NCfctf: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 bawpain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: jingle 
(D)TOPOLOGY: linear 

( I i ) MOLECULE TYPE: DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ID N0:l3: 

A AC A C C AANG ATGAT 



( 2 ) INFORMATION FOR SEQ tD NO:!*: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 baac pain 
( B ) TYPE; aodck acid 
( C ) STRANDEDNESS: tingle 
( D ) TOPOLOGY: 

( I I ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NChH: 

AAAC ACCANA GATGA 



( 2 ) INFORMATION FOR SEQ CD N0:15: 

( ! ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 baac pairs 
( B ) TYPE: oocIeJc acid 
( C ) STRANDEDNESS: single 
( D)TOPO LOGY: linear 
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( I I ) MOLECULE TYPE: DNA (probe) 
( i 1 ) SEQUENCE DESCRIPTION: SEQ CD N0:15: 
GAAACACCNA AGATO , 



( 2 ) INFORMATION FOR SEQ © NOH6: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 base pair* 
( B) TYPE: nudeie acid 
< C ) STRANDEDNESS: *hgle 
( D ) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION; SEQ tD NO:l6: 

CCAAACACNA AAGAT 



( 2 ) INFORMATION FOR SEQ ID NO:l7: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 base pairs 
( B ) TYPE: aacUte add 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: liaeax 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

.AGO A A A C A N C A A AG A 

( 2 ) INFORMATION FOR SEQ Q> NO:l8: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 21 base pain 
( B ) TYPE: coddc add 
< C ) STRANDEDNESS; single 
( D) TOPOLOGY: Imear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NOrlS: 

CCTTCAGAGG CTAAAATTAA G 



( 2 ) INFORMATION FOR SEQ ID NO:l9: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 21 base pairs 
( B )TY?E: nucleic add 
( C ) STRANDEDNESS: stable 
( D ) TOPOLOGY: Usev 

(II ) MOLECULE TYPE: DNA(probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ (D NO: 19: 

CCTTCAGAGT CTAAAATTAA C 



( 2 ) INFORMATION FOR SEQ D> NO30: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 44 base pairs 
( B ) TYPE: nodcic add 
( C ) STRANDEDNESS: tingle 
( D ) TOPOLOGY: Unor 

( I I ) MOLECULE TYPE: DNA (probe) 




( x I ) SEQUENCE DESCRIPTION: SEQ U> KO.20: 
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TAATACCACT CACTAT AC G G ACATOACCTA ATAATCATGC CTTT 4 

( 2 ) INFORMATION FOR SEQ [D NO:2I: 

( 1 )SEQUENCECHAI^CTEROTCS: .... 
( A ) LENGTH: 43 base pain 

( B ) TYPE: Dtjddc add * 
( C ) STRANDEDNESS: itngle 
( D) TOPOLOGY: linear 

(M ) MOLECULE TYPE: DNA (probe) 

( * 1 ) SEQUENCE DESCRIPTION: SEO ID NO:2t: 

TAATACGACT CACTATAOCC AGTAGTGTG A ACCGTTCATA TGC 4 3 

( 2 ) INFORMATION FOR SEQ ID NO:22: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 45 base pain 
( B )TYPE:nod«;c acid 
< C ) STRAND EDNESS: « Ingle 
( D ) TOPOLOGY: Ifao/ 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NO:2£ 

CTCGGAATTA ACCCTCACTA AAGCTACTCT CAACOCTTCA TATGC 45 

( 2 ) INFORMATION FOR SEQ ID NCk2J: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 43 base pain 
( B ) TYPE: Dodelc acid 
( C ) STRAND EDNESS: 1 Ingle 
(D)TOPOLOGY: linear 

(II) MOLECULE TYPE* DNA (probe) 

( jc I ) SEQUENCE DESCRIPTION: SEQ ED NO:23: 

TAATACGACT CACT ATAGGG AGAGCATACT AAAAOTGACT CTC 43 



( 2 ) INFORMATION FOR SEQ ED NO;24: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 44 base pain 
( B) TYPE: oorfdc add 
( C ) STRANDEDNESS: stogie 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NO:24: 

TAATACGACT CACT ATAGGG AGACATGAAT GACAT TT A C A GC A A 



( 2 ) INFORMATION FOR SEQ ED NO:25: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 44 base pain 
( B ) TYPE: oodcle add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE- DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NO-25: 

CGGAATTAAC C CTC ACT A A A GGACATGAAT GACAT TTACA GC A A 



( 2 ) INFORMATION FOR SEQ EO KOdfc 
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( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic »ctd 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: rbcar 

( 11 ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-JS: 

TTTATOOCOT C A 



( 2 ) INFORMATION FOR SEQ ID NO:27 : 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEONESS: single 
( D) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( m I ) SEQUENCE DESCRIPTION: SEQ CD NO-J7: 

TTGATTTATC GC 



( 2 ) INFORMATION FOR SEQ CD NO:28: 

( 1 ) SEQUENCE CHARACTERISTICS; 

( A ) LENGTH: 13 bw pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEONESS: single 
( D) TOPOLOGY: linear 

( II) MOLECULE TYPE: DNA (probe) 

(Jl) SEQUENCE DESCRIPTION: SEQ CD NCOS: 

AACCTATTTG ATT 



( 2 ) INFORMATION FOR SEQ CD NO:29: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B ) TYPE- nucleic acid 
( C ) STRANDEONESS: single 
( D ) TOPOLOGY: linear 

( 1 1 ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NO:29: 

CGACCAAACC TA 



( 2 ) INFORMATION FOR SEQ CD NO-JO: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
< B)TYPE: nucleic acid 
( C ) STRANDEONESS: single 
(D)TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ CD NO-JO: 

AGGCTAGGA C CA 



( 2 ) INFORMATION FOR SEQ CD NO-Jl: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH.' 13 base pairs 
( B ) TYPE: nodeic acid 
( C ) STRANDEONESS: single 
( D ) TOPOLOGY: linear 
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( I I ) MOLECULE TYPE: DNA (probe) 
( x I ) SEQUENCE DESCRIPTION': SEQ CDNO-Jl: 
GGTGTGTCTG TCC ' 

( 2 ) INFORMATION FOR SEQ CD NOJ2: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 14 base pi'm 
( B ) TYPE: awWe acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NOOi 

CGGTGTGTGT GTGC 

( 2 ) INFORMATION FOR SEQ CD NO-JJ: 

( ! ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 14 base pain 
( B ) TYPE: docMc x!d 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NOJ3: 

OGTCTCTCTG TGCT 



I 3 



1 4 



( 2 ) INFORMATION FOR SEQ CD NO-J4: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
•(B) TYPE: nucleic s<!d 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

SEQUENCE DESCRIPTION: SEQ ID NO Mt 

CTGGGTAGGA TG 



( 2 ) INFORMATION FOR SEQ CD N0O5: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pin 
(B)TYPE: ooclelc acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear ; 

(11) MOLECULE TYPE: DNA (probe) . . 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NOJ5: 

TGCTGGGTAG GA 12 



( 2 ) INFORMATION FOR SEQ CD NO-Jo*: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B)TYPE:aKleIcieId 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: tiaor 

( I I ) MOLECULE TYPE: DNA (probe) 



( x I ) SEQUENCE DESCRIPTION: SEQ CD NO--W: 
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TOTCCTCCGT AC 



( 2 ) INFORMATION FOR SEQ CD NO-J7: 

-'•'■(' 1 ) SEQUENCE CHARACTER ISTTCS: . 

it A ) LENGTH: 12 b**e pin 
( B ) TYPE: nocleic tcid 
( C ) STRANDEDNESS: ilaglc 
{ D ) TOPOLOGY: (hear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO-J7; 

GTTAGCAGCG GT 



( 2 ) INFORMATION FOR SEQ ID NO-J8: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 buc pin 
( B ) TYPE: auctelc »cJd 
( C ) 5TRANDEDNESS: sialic 
( D) TOPOLOGY: that 

( ! ! ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOOS: 

GGGTTAGCAG CG 



( 2 ) INFORMATION FOR SEQ ID NO:39: 

( I ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 11 bw pain . 

( B ) TYPE: nucleic seid 

( C ) STRAND EDNESS: iia^e 

(D)TOPOLOGY:lo«r 

( I i ) MOLECULE TYPE; DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NOJfc 

AGCCCCGGAC G 



( 2 ) INFORMATION FOR SEQ ID NO:40: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 buc pin 
( B ) TYPE: qocIcIc *rfd 
( C ) STRANDEDNESS: shjle 
( D) TOPOLOGY: linen 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:«t 

AGCGGGGGAG 



( 2 ) INFORMATION FOR SEQ CD NO:*l: 

( ! ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 base pin 
( B )TYPE: oneiric »dd 
( C ) STRANDEDNESS: ibgte 
( D ) TOPOLOGY. linear 

(II) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD N&41: 

GGTTGCTTCG G 



( 2 ) CNFORMATTON FOR SEQ CD NO:42: 
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( I ) SEQUESCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: ah^te 
( D )T0K)LOGY: linear . 

(I I) MOLECULE TYPE: DNA (pob<) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:42: 

OCGTTTGGTT GG 

( 2 ) INFORMATION FOR SEQ CD NO;43: 

( 1 ) SEQUENCE CHARACTERtSnCS: 
( A ) LENGTH: 12 but pain 
( B )TYPE: eocleic acid 
( C ) STRANDEDNESS: amglc 
( D ) TOPOLOGY: linear 

( ! i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:43: 

GATCTTTGGG GT 



1 2 



( 2 ) INFORMATION FOR SEQ CD NO:U: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bwe pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: ain^e 
.(D) TOPOLOGY: linear 

( I i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:44: 

GGGTGATCTT TG 



1 2 



( 2 ) INFORMATION FOR SEQ CD NO:4S: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B) TYPE: nucleic acid 
( C ) STRANDEDNESS: angle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:45: 

TGTGGGGGGT G A 



1 2 



( 2 ) INFORMATION FOR SEQ CD NO:«: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 buc pain 
( B ) TYPE: nodcic acid 
( C ) STRANDEDNESS: abgle 
( D ) TOPOLOGY: (hear 

(II ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:46*: 

T A AACTGTGG GG 12 



( 2 ) INFORMATION FOR SEQ CD NCh*7: 



( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 baie pain 
( B )TYPE:aodcicacid 
( C ) STRANDEDNESS: tbgle 
( D) TOPOLOGY: ttwtr 
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( I I ) MOLECULE TYPE: DNA (probe) 
( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:*7: 
GCT A CAT A A A CTG : ...... \. .'• :\. '- ; . , : I 3 " 



( 2 ) INFORMATION FOR SEQ ID NO:43: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 bisc pain 
( B ) TYPE: noctele add 
( C ) STRANDEDNESS: tingle 
( D ) TOPOLOGY: linear 

( I i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION; SEQ ID NO:4& 

GAGGTAAGCT ACA ' 13 



( 2 ) INFORMATION FOR SEQ ID NO:49: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:W: 

. GAOOAGCTAA GC .12 



( 2 ) [NFORMATTON FOR SEQ Q> NOJO: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nocUic acid 
( C ) STRANDEDNESS: iingle 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO-JO: 

TGCTT TGAGG AG 12 



( 2 ) INFORMATION FOR SEQ ID NOJl: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: D bw« pain 
( B)TYP£: aodelcKld 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear . : 

( M ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ tD NO:51: 

AGTCT ATTGC TTT I * 



( 2 ) INFORMATION FOR SEQ ID N052: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: L3 base pain 
( B ) TYPE: aodclc acid 
( C ) STRANDEDNESS: tingle 
( D) TOPOLOGY: linear 

( I i ) MOLECULE TYPE: DNA (probe) 



( x I ) SEQUENCE DESCRIPTION: SEQ W NO*J2: 
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CATTTTCAGT C T A 

( 2 ) INFORMATION FOR SEQ CD NOJ3: 

(I ) SEQUENCE CHARACTERISTICS: . 
•( A ) LENGTH: O bsxc pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNES5: alagle 
< D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NOJ3: 

TAAACATTTT C AG 



( 2 ) INFORMATION FOR SEQ ID NO:54: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 baa«pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO -J4: 

AGCCCGTCTA AA 



1 2 



( 2 ) INFORMATION FOR SEQ (D NO-JS: 

/( I. ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bu« pair* 
( B ) TYPE nucleic acid : . 
( C ) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

( I i ) MOLECULE TYPE: DNA (probe) 

(xl) SEQUENCE DESCRIPTION: SEQ CD NO:55: 

GAGCCCG TCT AA 



1 2 



( 2 ) INFORMATION FOR SEQ ID NChSS: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 ba* pairs 
( B ) TYPE: nodelc add 
( C ) STRANDEDNESS: ifagle 
( D) TOPOLOGY: Ihear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD XQ J6: 

TCATOTCACC CC . 



1 2 



( 2 ) INFORMATION FOR SEQ CD NO:57: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B )TYPE:n«!«!c acid 
( C ) STRANDEDNESS: tingle 
(D)TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-J7: 

GGGGTGATGT G A 



1 2 



( 2 ) INFORMATION FOR SEQ CD NCn5S: 
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( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENC7TK: 11 b-w pain 
( B) TYPE nucleic acid 
( C ) STRANDEDNESS: imgtc 
( D ) TOPOLOGY: linear : 

( M ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO*J3: 

CAGTCGGAGC C 



1 1 



( 2 ) INFORMATION FOR SEQ CD NCh39: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH* 12 ba* pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: jingle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO :59: 

GTATGGGAGT GG 



1 2 



( 2 ) INFORMATION FOR SEQ CD NO:60: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH U buc pain 
( B ) TYPE: nodcte acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: tmear 

( i i ) MOLECULE TYPE DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOrfO: 

GATTAGTAGT ATGG 



1 4 



( 2 ) INFORMATION FOR SEQ ID NO:Sl: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B ) TYPE: nucleic acid 
< C ) STRANDEDNESS: abgle 
(D)TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NOrfl: 

T G A ATGAGAT TAG 



1 3 



( 2 ) INFORMATION FOR SEQ CO NO:«: 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH 13 baac pain 
( B ) TYPE: aadclc acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NW1 

ATTGAATGAG ATT 



1 3 



< 2 ) INFORMATION FOR SEQ CD NQ-.62: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH Ubaaepaira 
( B) TYPE oodcic acid 
( C ) STRANDEDNESS: tingle 
( D) TOPOLOGY: linear 
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( I I ) MOLECULE TYPE: DNA (probe) 
( x I ) SEQUENCE DESCRIPTION; SEQ 0D NO:63: . 
COCTTCTATT GAA 



( 2 ) INFORMATION FOR SEQ CD NO**: 

( I ) SEQUENCE CHARACTERISTICS; 

( A ) LENGTH: 10 base pain 
( B ) TYPE: Boctcic tcld 
( C ) STRANDEDNESS: single 
( D )TOPOLOOY: linear 

( I t ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:64; 

GCCCGOCTTC 10 



( 2 ) INFORMATION FOR SEQ ID NO-.65: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 base pain 
( B )TY?&QodeIc add 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 



( i I ) MOLECULE TYPE: DNA (probe) 



( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:65: 



ATGGCCGGGC 



( 2 ) INFORMATION FOR SEQ ti> NO:**: 



( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: It base pain 
( B)TYP&midc!c acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 



( I I ) MOLECULE TYPE: DNA (probe) 



( x I ) SEQUENCE DESCRIPTION: SEQ OD NOrfd 1 : 
TAGGATGGGC G II 



( 2 ) INFORMATION FOR SEQ QD NOrf7: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE; Dodek tcld 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

(1 I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CO NO*7: 

TGGOTAGGAT CG 12 



( 2 ) INFORMATION FOR SEQ ID NOrfB: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B) TYPE: sodele add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 



( x I ) SEQUENCE DESCRIPTION: SEQ ID NOrf& 
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CTCCTCCGTA CG 



1 2 



( 2 ) INFORMATION FOR SEQ ID NO:59: 

( t ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pair* 
( 8 ) TYPE: aacleic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: llacar 

( 1 1 ) MOLECULE TYPE: D.VA (probe) 

(x I ) SEQUENCE DESCRIPTION: SEO CD NO:69: 

TGTGTGTGCT GG 



1 2 



( 2 ) INFORMATION FOR SEQ ID NO:70: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B )TYPE: Boctcicactd 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: D.VA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ CD NO:70: 

GCGGTGTGTG TG 



1 2 



( 2 ) INFORMATION FOR SEQ (D NO:7l: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B ) TYPE: aoclctc icid 
( C ) STRANDEDNESS: single 
< D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:7l: 

TAGCAGCGGT GT 



( 2 ) INFORMATION FOR SEQ CD KO:72: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: bocIcIc add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: llaeu 

( I ! ) MOLECULE TYPE: DNA (probe) 

(it) SEQUENCE DESCRIPTION: SEQ CD NChTi 

T GGGGTT AGC AG. 



11 



( 2 ) INFORMATION FOR SEQ CD NO:73: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH; 12 base pairs 
( B )TYPE:a9delc*c!d 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: Ibar 

( I I ) MOLECULE TYPE: DNA (probe) 

( « I ) SEQUENCE DESCRIPTION: SEQ CD NO:7J: 

GGTATCGGGT TA 



1 2 



( 2 ) INFORMATION FOR SEQ CD NO:74: 
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( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bw pain 
( B ) TYPE: DBcIeie acid . 
( C ) STRANDEDNESS: sialic • 
.: '( D ) TOPOLOGY: tiaear 

". . ( i r ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:74: 

GTTCGGGGTA TG 



( 2 ) INFORMATION FOR SEQ ID NO-.75: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base patn 
( B) TYPE awleic acid 
( C ) STRANDEDNESS: aiagle 
( D) TOPOLOGY: linear 

( 1 i ) MOLECULE TYPE: DNA (probe) 

(ml) SEQUENCE DESCRIPTION: SEQ ID NO:75: 

OCTCCTCTTA GG 



( 2 ) INFORMATION FOR SEQ CD N0:7& 

( i ) SEQUENCE CHARACTERISTICS: 
< A) LENGTH: 12 ba* pain 
( B ) TYPE eocleie acid 
( C ) STRANDEDNESS: imgtc 
( p) TOPOLOGY: linear 

(II) MOLECULE TYPE: DNA (probe) 

( " x 1 ) SEQUENCE DESCRIPTION: SEQ ED N0:76: 

GGTTAGGCTG GT 



( 2 ) INFORMATION FOR SEQ CO N0:77: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U baac pain 
( B )TYP& nucleic acid 
< C ) STRANDEDNESS: single 
( D ) TOPOLOGY: tincu 

( 1 ( ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NO:77: 

A A ATCTGGTT AGO 



( 2 ) INFORMATION FOR SEQ CD NO:78: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: angle 
( D ) TOPOLOGY: [mar 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:7S: 

AA ATTTGAA A TCT 



( 2 ) INFORMATION FOR SEQ ID NO:79: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B ) TYPE: aaclelc add 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: Unev 
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( 1 I ) MOLECULE TYPE: DNA (probe) 
.(H) SEQUENCE DESCRIPTION: SEQ ED NO:79; 
AAGATAAAAT TTC 



1 3 



( 2 ) INFORMATION FOR SEQ ED NO:80: 

( I ) SEQUENCE CHARACTERISTICS: 
. ( A ) LENGTH: 12 base pain 
( B)TYPE;nacMc Kid 
( C ) STRANDEONESS: ibtfe 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:80: 

GC C A A A A AG A TA 



1 2 



( 2 ) INFORMATION FOR SEQ CD NOrSl: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U base pain 
( B ) TYPE: nocleic Kid 
( C ) STRANDEONESS: *m&* 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO* I: 

CGCCAAAAAG A 



( 2 ) INFORMATION FOR SEQ ID NG.S2: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 ban pain 
( B )TYFE:nwleic Kid 
( C ) STRANDEONESS: single 
( D ) TOPOLOGY: linear 

( 1 ! ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:8i 

CATACCGCCA A 



1 1 



( 2 ) INFORMATION FOR SEQ CD NOtU: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 ba« pain 
( 8 )TYPE: noclclc Kid 
( C ) STRANDEONESS: single 
( D ) TOPOLOGY:, [hear 

(II) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NCh83: 

A A A AGT GCAT ACC 



1 3 



( 2 ) INFORMATION FOR SEQ CO HOi&i: 

( I ) SEQUENCE CHARACTERISTICS: • 
( A ) LENGTH: 13 bu« pain 
( B ) TYPE: nucleic Kid 
* ( C ) STRANDEONESS: ibgle 

( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 



( x I ) SEQUENCE DESCRIPTION: SEQ CO NO**: 
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TOTTAAAAGT OCA 



( 2 ) INFORMATION FOR SEQ ID NO:&5: 

, ( I ) SEQUENCE CKARACTERtSTTCS: ' . 
. (A ) LENGTH: H ba*c pain 
( B ) TYPE: nucleic acid 
( C ) STRAND EDNESS: *ln S !e 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:S5: 

GGGTG AC TGT T A A 



( 2 ) INFORMATION FOR SEQ ID NOB6: 

< I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B )TYPE aodeic acid 
( C ) STRAND EDNESS: slsgle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:8S: 

CGCCCTGACT GT 



( 2 ) INFORMATION FOR SEQ ID NO:S7: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U base pain 
( B)TYPE.-nodeicacid 
( C ) STRAND ED NESS: liable 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:87: 

AGTTGGGGGG T 



( 2 ) INFORMATION FOR SEQ ID NO-.SS: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pair* 
( B ) TYPE: nocleic Kid 
( C ) STRAND EDNESS: abgle 
• ( D ) TOPOLOGY: linear 

(II) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO^& 

TGTGTT AGTT GGG 



( 2 ) INFORMATION FOR SEQ ID NOS9: 

( I ) SEQUENCE CHARACTERISTICS: 
< A ) LENGTH: U base pairs 
( B ) TYPE: ttndele acid 
( C ) STRAND EDNESS: .Ingle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO** 

AAAATAATCT CTT 



( 2 ) INFORMATION FOR SEQ ID NOSO: 
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( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B)TYPE:nocleicactd 
( C ) STRANDEDNESS: single 
( D)TOPOLOCY: linear 

. . (II ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ tD NOSO: 

AOCCCAAAAT AA 



( 2 ) INFORMATION FOR SEQ ID NOSl: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 ba*c pairs 
( B) TYPE: nucleic »dd 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( ] I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD N091: 

GG AC GOG AAA AT 



( 2 ) INFORMATION FOR SEQ tD N032: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B )TYPE: nucleic scid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

(1 I . ) MOLECULE TYPE: DNA (probe) 

(xl .) SEQUENCE DESCRIPTION: SEQ © NO$2: 

GCAAATTTTT TG 



( 2 ) INFORMATION FOR SEQ ID NOSJ: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 12 owe pain 
( B )TYPE:nodeic*cid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ W NOSJ: 

GGTGOAAATT TT 



( 2 ) INFORMATION FOR SEQ ID NO£4: 

(I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 base pain 
( B ) TYPE: ttodck acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NOS4: 

GGTTTGGTGG A 



( 2 ) INFORMATION FOR SEQ ID N035: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 base pain 
( B) TYPE nncUic add 
( C ) STRANDEDNESS: alngle 
( D ) TOPOLOGY: linear 
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( ! i ) MOLECULE TYPE; DNA (probe) 
( .x I ) SEQUENCE DESCRIPTION: SEQ ID NO:9J: 
> CACCCOCOOT f 

( 2 ) INFORMATION" FOR SEQ ID NO: 94: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH 10 bise pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

{ I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID N036: 

CCGGGGCAGG 



( 2 ) INFORMATION FOR SEQ ID NO:97: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 ba*e pain 
( B ) TYPE: aodcfc acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:97: 

CAGAAGCGGG G 



1 1 



( 2 ) (NFORMATTON FOR SEQ ID NOSS: 

( i ) SEQUENCE CHARACTERISTICS: 
{ A ) LENGTH: 12 ba*cpaln 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: lineir 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID N03& 

GTAGGCCAGA AG 



1 2 



( 2 ) INFORMATION FOR SEQ ID N039: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH 12 but pain 
( B ) TYPE: nodele Kid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

(II) MOL£CULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ W NOS9: 

OTGCTGTAGG CC 



1 2 



( 2 ) INFORMATION FOR SEQ ID NO: 100: 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH 13 but pain 
( B)TYPE:n«We*id 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: Uaew 

( I 1 ) MOLECULE TYPE: DNA (probe) 



( x I ) SEQUENCE DESCRIPTION: SEQ © NOtlOO: 
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TCTTTAACTC CTO 



( 2 ) INFORMATION FOR SEQ ID NO: 101: 

( i ) SEQUENCT CKAIUCTERiynCS: 
( A ) LENGTH; 13 ba« pjlri 
( B )TYPE- nocleic acid 
( C ) STRANDEDNESS: tingle 
( D ) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:10I: 

TCTCTTTAAG TGC 



( 2 ) INFORMATION FOR SEQ CD MHO! 

( t ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B )TYP£: oudcic acid 
( C ) STRANDED NESS: tingle 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ CD NOU02: 

GCAGAGATCT GTT 



( 2 ) INFORMATION FOR SEQ ID NOilOJ: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nocleic acid 
•(C) STRAND EDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE* DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOUOJ: 

TTTGGCAGAG AT 



( 2 ) INFORMATION FOR SEQ ID NOrlM: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 base pain 
( B ) TYPE: Boctcie acid 
( C ) STRAND EDNESS: single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE DNA (probe) 

-.<*■"> SEQUENCE DESCRIPTION: SEQ CD NO:104: 

GGGGTTTGGC A 



( 2 ) INFORMATION FOR SEQ CD NO: 105: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nndelc acid 
( C ) STRAND EDNESS: tingle 
( D) TOPOLOGY: linear 

< I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NOH05: 

TGTTTTTGGG GT 



( 2 ) INFORMATION FOR SEQ CD NOU06: 
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( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
< B) TYPE: nucleic acid 
( C ) STRANDEDNESS: static . 
( D ) TOPOLOGY: linear \ . 

< I i ) MOLECULE TYPE: DXA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ tD NO:106: 

TTTCTTTTTG GC 



( 2 ) INFORMATION FOR SEQ CD NO:107 : 

< I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: sh^Ie 
(D)TOPOLOGY: linear 

( 1 1 ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:t07: 

GGGTT CTTTG TT 



1 2 



( 2 ) INFORMATION FOR SEQ ID NOU0& 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS; sia^e 
( D) TOPOLOGY: linear 

. ( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CO NO:lC8: 

GTGTTAGOGT TCT 



1 3 



( 2 ) INFORMATION FOR SEQ ID NO:109: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: Ubase pain 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS; 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE* DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:109: 

TTTA.CTAAGT ATGT 



1 4 



( 2 ) INFORMATION FOR SEQ CD NO:llO: 

(I ) SEQUENCE CHARACTERISTICS;. 

( A ) LENGTH: 13 base pain 
( B) TYPE: ftodeic acid 
( C ) STRANDEDNESS: ib^e 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:110: 

A AC ACACTTT ACT 



1 3 



( 2 ) INFORMATION FOR SEQ CD hOiUU 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 14 base pain 
( B ) TYPE: Dodcic acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: Uncax 
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( I I ) MOLECULE TYPE: DNA (probe) 
..(»!) SEQUENCE DESCRIPTION; SEQ ED NO:lU: 
AATTAATTAA CACA ; 

( 2 ) INFORMATION FOR SEQ CO NOH12: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH; 13 base pain 
( B ) TYPE: noclcle acid 
( C ) STRAND ED NESS: single 
(D)TOPOLOGY: linear 

(II) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NOH12: 

AAOCATTAAT T A A 

( 2 ) INFORMATION FOR SEQ ID NOHU: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 ba« pair* 
( 6 ) TYPE: nnelelc acid 
( C ) STRAND EDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( i i ) SEQUENCE DESCRIPTION: SEQ ID NO:tU: 

CTCCTACAAC CAT 

( 2 ) INFORMATION FOR SEQ CD NOtlU: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 baie pain 
( B )TYPE:n«l«!e acid 
( C ) STRAND EDNESS: single 
( D) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ CD NO:lW: 

TGTCCTACAA GCA 



( 2 ) INFORMATION FOR SEQ CD KOU15: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B)TYPE:n«lefc»cid 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NO:U5: 

ATTATTATGT CCT 13 



( 2 ) CNFORMATION FOR SEQ CD NChUtf: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 14 base pain 
( B ) TYPE: SBdeie Kid 
( C ) STRANDEDNESS: single 
<D)TOPOLOGY; linear 

( I I ) MOLECULE TYPE: DNA (probe) 



(i I) SEQUENCE DESCRIPTION: SEQ CD NO: 114: 
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TTCTTATTAT TATC 



1 4 



( 2 ) INFORMATION 1 FOR SEQ CD NO:ll7 : 

( i') SEQUENCE CHARACTERISTICS; . 

( A ) LENGTH: 13 base pairs 
( B ) TYPE: nocleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO: 1 17: 

ATTCAAATTG TTA 



( 2 ) INFORMATION FOR SEQ ID NOtUi 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: liable 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NOU18; 

CCAOACATTC AAA 



I 3 



( 2 ) INFORMATION FOR SEQ ID NO: 119: 

( I ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 12 base pairs . 
( B ) TYPE: oudeic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD N0:1L9: 

CCTGTCCAGA CA 12 



( 2 ) INFORMATION FOR SEQ ID NO: 120 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:120: 

AAACTCCCTC TG 



( 2 ) INFORMATION FOR SEQ ID NOiUl: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pairs 
( B ) TYPE: aodele acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOH21: 

TGTGTGGAAA GTG 



( 2 ) INFORMATION FOR SEQ CD NOU22: 
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( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 bue pain 
( B )TYPE nucleic Kid 
( C ) STRANDEDNESS: single 
- ( D ) TOPOLOGY: linear 

■(II') MOLECULE TYPE: D.VA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NO-.L22: 

GATGTCTOTO TGG 



1 3 



( 2 ) INFORMATION FOR SEQ © NO-.123: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 ba*c pain 
( B ) TYPE: nucleic Kid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ tD NO: 113: 

AT GATGTCTG TGT 



( 2 ) INFORMATION FOR SEQ ID HOtlllz 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 bue p«In 
( B ) TYPE: andete Kid 
( C ) STRANDEDNESS: jingle 
( D) TOPOLOGY: linear 

( II) MOLECULE TYPE; DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:12*: 

TTTTGTTATG ATG 



1 3 



( 2 ) INFORMATION FOR SEQ ID NOrLW: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 13 bue patn 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: sngfe 
( D ) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD N0:125: 

TTTTTTGTTA TGA 



1 3 



( 2 ) INFORMATION FOR SEQ D> NO: 125; 

( I ) SEQUENCE CHARACTERISTICS: 
( A )LENGTH:l2bi*<paIn 
( B) TYPE nocleic Kid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: Hutu 

( I I ) MOLECULE TYPE DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ Q> NO:l26: 

AT AGGGTG CT CC 



1 2 



( 2 ) INFORMATION FOR SEQ ID NOU27: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bate patn 
( B )TYPE Bnddc *cid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: Dneu 
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( t I ) MOLECULE TYPE; DNA (probe) 
(il) SEQUENCE DESCRIPTION: SEQ ID NOd27: 
CCCACATAOO CT- ^ 1. 

( 2 ) INFORMATION FOR SEQ ID IS'0:12& 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED N0:128: 

TACTCCOACA TAG 



1 3 



( 2 ) INFORMATION FOR SEQ ID NO:129: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B)TY?£:nocteIc acid 
( C ) STRANDEDNESS: sm^te 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:l29: 

CACACATACT OCC 



( 2 ) INFORMATION FOR SEQ ID NOtLICt 

( I ) SEQUENCE CHARACTERISTICS: 

< A ) LENGTH: 13 base pain 
( B ) TYPE: nucleic acid 

< C ) STRANDEDNESS: »ingle 
( D ) TOPOLOGY: linear 

( 1 1 ) MOLECULE TYPE* DNA (probe) 

(il) SEQUENCE DESCRIPTION: SEQ ID NOU30: 

AATCAAAGAC AG A 



1 3 



( 2 ) INFORMATION FOR SEQ ID NOiUl: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: L3 base pain 
( B) TYPE; ftwlclc acid 
. < C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID MfcUl: 

AGGAATCAAA G A C 1 3 



( 2 ) INFORMATION FOR SEQ Q> NOtUl 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bu« pain 
( B ) TYPE: noclefc acid 
( C ) STRANDEDNESS: tingle 
( D ) TOPOLOGY: linor 

( I I ) MOLECULE TYPE: DNA (probe) 



< x I ) SEQUENCE DESCRIPTION: SEQ ID N0:132: 



85 



5,837,832 

-continued 



86 



TCACCCACGA AT 

( 2 ) INFORMATION FOR SEQ CD NO:133: . 

< I )SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 baic pain 
( B)TYPE: nucleic acid 
( C ) STRANDEDNESS: tingle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:133: 

ACGATCACCC AG 



< 2 ) INFORMATION FOR SEQ ID NO:U4: 

< I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: stngk 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD N0:134: 

A A AT A AT AG G ATG 



1 3 



( 2 ) INFORMATION FOR SEQ ID NO.L35: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
< B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: ilagle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NO: 1 35: 



( 2 ) INFORMATION FOR SEQ CD NO 136: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOtlM: 



( 2 ) INFORMATION FOR SEQ CD NO: 137: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: aodelc acid 
( C ) STRANDEDNESS: stn&lc 
( D) TOPOLOGY: linear 

( ! I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NO: 137: 

GTAGGATGCG AT 



( 2 ) INFORMATION FOR SEQ CD NO: 13 8: 
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( I ) SEQUENCE CHARACTERISTICS: 
< A ) LENGTH: 12 6«*e pain 
( B ) TYPE: ooctctc acid 
- ( C ) STRANDEDNESS: linglc 
; ( D ) TOPOLOGY: linear 

( M ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:lJS: 

TTGAACGTAG GA 



( 2 ) INFORMATION FOR SEQ ID N0:139: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 ba« pair* 
( B ) TYPE: nocleie acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:U9: 

A AT AT T G A A C GTA 



1 3 



( 2 ) INFORMATION FOR SEQ ED NO:t40: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 baaepain 
( B )TYPE: nocleie acid 
( C ) STRANDEDNESS: *m 5 1c 
( D) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (probe) ". 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:140: 

GC C T G T A ATA TTG 



( 2 ) INFORMATION FOR SEQ CD NOrWb 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B)TYPE:nwteicac!d 
( C ) STRANDEDNESS: $bi S \c 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD N0:141: 

TGTTCGCCTG TA 



1 2 



( 2 ) INFORMATION FOR SEQ CD NO:U2: • . 

. ( I ) SEQUENCE CHARACTERISTICS: . 

( A ) LENGTH: 12 base pain 
( B ) TYPE: Boclerc acid 
( C ) STRANDEDNESS: t'm&lc 
( D) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:l42: 

GTATGTTCGC CT 



1 2 



( 2 ) INFORMATION FOR SEQ CD NO:l4J: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bue pain 
( B)TYPE:noc!«ic»ctd 
( C ) STRANDEDNESS: sbgle 
( D ) TOPOLOGY: linear 
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( I 1 ) MOLECULE TYPE: DNA (probe) 
( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:l-U: 
CTCCCGTCAC •TO'.' 



-1 2 



( 2 ) INFORMATION FOR SEQ D> NO:l«: 

( ! ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
< 8)TYPE:aadcic acid 
( C ) STRANDEDNESS: stable 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE- DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:l«: 

GAGAOCTCCC GT 



( 2 ) INFORMATION FOR SEQ ID NO:U5: 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 12 base pairs 
( B)TYPE: nucleic add 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:US: 

ATCCAGACCT CC 



I 2 



( 2 ) INFORMATION FOR SEQ DD NOrlW: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B)TYPB ooddcadd 
( C ) STRANDEDNESS: smgic 
( D ) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:146: 

AATGCATGGA GA 



1 2 



( 2 ) INFORMATION FOR SEQ CD NO:U7; 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bajc pain 
( B ) TYPE: ooddc add 
( C ) STRANDEDNESS: ataxic 
(D)TOPOLOGY.Iiaear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NO:U7 : 

ATACCAAATG CA 



1 2 



( 2 ) INFORMATION FOR SEQ CD NOrl**: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U bu« pairs 
( B)TYP£ okIcIc add 
( C ) STRANDEDNESS: tingle 
( D) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 



( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:148: 
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C A CCA AAA TA CCA 

( 2 ) CsTORMATTOtf FOR SEQ ID N0:149: - 

r. ( I ) SEQUENCE CHAittCTEatSTlCS:. 

( A ) LENGTH: 11 buc pairs 
( B ) TYPE: qocJcic acid 
( C ) STRANDEONESS: ring* 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ Q> NO:U9: 

CCCACACCAA A 

< 2 ) INFORMATION FOR SEQ ID NOtUCt 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 hue pain 
( B ) TYPE: anclctc *c'id 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NO:150: 

TACCCCCCAC A 

( 2 ) INFORMATION FOR SEQ ID NO:t5L: 

( 1 ) SEQUENCE CHARACTERISTICS: 
; ( A ) LENGTH; 11 baac pain 
( B ) TYPE: oodeic Kid 
( C ) STRANDEDNESS: *h£* 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID N0:151: 

TGCATACCCCC 

( 2 ) INFORMATION FOR SEQ ID N0:152: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 buc pain 
( B )TYFE:aoc!c?cac!<J 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NOU52: 

TCCCpTCCAT AC 



1 3 



1 1 



( 2 ) INFORMATION FOR SEQ ED N0:153: 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 12 base pain 
( B)TY?E: aodcleacld 
( C ) STRANDEDNESS: sbgle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:13J: 

GACTATCGCG TG 



t 2 



( 2 ) INFORMATION FOR SEQ CD NOtU*: 



93 



5,837,832 

-continued 



94 



( ! ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pair* 
( B ) TYPE: aoclctc add 
( C ) STRANDEDNESS: sh&c 
( D) TOPOLOGY: linear /. 

( i l ) MOLECULE TYPE D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID N0:154: 

ATG ACTA TCC CO 



1 2 



( 2 ) INFORMATION* FOR SEQ ID NO:L55: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: tingle 
( D) TOPOLOGY: linear 

( ( I ) MOLECULE TYPE D.VA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID N0:155: 

CTCCCAATGA CT 



( 2 ) INFORMATION FOR SEQ ID NfeUtf: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: I2ba*c pair* 
( B ) TYPE: aodetc add 
( C ) STRANDED NESS: ilntfe 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID N0:136: 

CGTCTCGCAA TG 



1 2 



( 2 ) INFORMATION FOR SEQ ID NO:157: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 baac pain 
( B )TYPE-Bocle!cac!d 
( C ) STRAND EDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( z I ) SEQUENCE DESCRIPTION: SEQ ED NO: 157: 

CTCCAGCGTC TC 12 



( 2 i) INFORMATION FOR SEQ ID NO-.L5& 

( I ) SEQUENCE CHARACTERISTICS; 

( A ) LENGTH: 11 Ux pain 
( B ) TYPE: noclclc acid 
( C ) STRANDEDNESS; abgle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:l58: 

TCCGGCTCCA G 1 1 



( 2 ) INFORMATION FOR SEQ ID NO: 15 9: 

( I ) SEQUENCE CHARACTERISTICS; 

( A ) LENGTH: 11 base pain 
( B ) TYPE- aodetc acid 
( C ) STRAND EDNESS: afagte 
( D ) TOPOLOGY: linear 
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( i i ) MOLECULE TYPE: DNA (probe) 
( x I ) SEQUENCE DESCRTPTION: SEQ ID NO: 1 59: 
GTCCTCCCGC r •.' ■ - 

( 2 ) INFORMATION FOR SEQ ED NO:l60t 

( I ) SEQUENCE CHARACTER tSTICS: 
( A ) LENGTH: U base pain 
( B ) TYPE: Bodele acid 
( C ) STRAND EDNES5; » Ingle 
( D ) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:L60: 

CACCCTCAAG TAG 



( 2 ) INFORMATION FOR SEQ ID NOtifiL- 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: t J base pain 
( B) TYPE: nucleic »cid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:l6t: 

TTTATCACCC TGA 



1 3 



( 2 ) INFORMATION FOR SEQ DO NOtWi 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 baae pain 
( B ) TYPE: cockle acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ n> N0:162: 

TTTAGGCTTT ATG 



1 3 



( 2 ) INFORMATION FOR SEQ ID N0H63: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bu< pain 
( B ) TYPE: noclcic acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ QD NOU63: 

CCT ATTT AGO CT 



( 2 ) INFORMATION FOR SEQ ID NOtlfl* 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 btu, pain 
( B) TYPE: aocteic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:tW: 



5,837,832 

97 

^ ^ -continued 

TGGGCTATTT AG 



( 2 ) CNFORMATION FOR SEQ ID NO.165: ' . 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain . 
( B)TYFE:aoc!cicacid 
( C ) STRAXDEDNESS: single 
( D) TOPOLOGY: linear 

( ! I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NOUM: 

ACGTGTGCGC TA 



( 2 ) INFORMATION FOR SEQ ID NO:t66: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B) TYPE: aodeic acid 
( C ) STRAND EDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID N0:l&5: 

AGGGGAACGT GT 



( 2 ) INFORMATION FOR SEQ (D N0:I57: 

( F ) SEQUENCE CHARACTERISTICS: 
< A ) LENGTH: 12 base pain 
( B ) TYPE: sodcie acid 
( C )STRANDEDNESS: sbglc 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:U7: 

TTTAAGGGGA AC 



( 2 ) INFORMATION FOR SEQ ID #0:16*8: 

< 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 14 base pain 
( B ) TYPE* saefcte add 
( C ) STRAXDEDNESS: sbgle 
( D ) TOPOLOGY. Unai 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( « I ) SEQUENCE DESCRIPTION: SEQ ID NO:I68: 

ATGTCTTATT . T A AG 



( 2 ) INFORMATION FOR SEQ ED VfeUfc 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH; U base pain 
( B ) TYPE: ttodck acid 
( C ) STRAXDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO: 1 69: 

CATCGTGATG TCT 



( 2 ) INFORMATION FOR SEQ CD NOa70t 
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( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 buc pain 
( B)TYPE: oneiric acid 
( C )STRAM)EDNESS: »m 5 Ic 
( D /TOPOLOGY: linear 

( I 1.) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:l70: 

TCCATCCTGA TC 



1 2 



( 2 ) INFORMATION FOR SEQ ID NO:l7l: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 b»c pairs 
( B ) TYPE: ascitic acid 
( C ) STRANDEONESS: single 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: D.VA (probe) 

( x J ) SEQUENCE DESCRIPTION: SEQ ID NO: 1 71: 

GATGATCCAT CC 



1 2 



( 2 ) INFORMATION FOR SEQ ID NO:l71 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 ba« pain 
( B ) TYPE- Qoctcie acid 
( C ) STRAND EDNESS : smgle 
( D ) TOPOLOGY; linear 

( i I ) MOLECULE TYPE: DNA (probe) 

(il) SEQUENCE DESCRIPTION: SEQ CD NO:i72: 

AGACCTGATG AT C 



( 2 ) INFORMATION FOR SEQ CD MH7J: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B ) TYPE: nadeic acid 
( C ) STRAND EDNESS: angle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NO:l73: 

GGGTG ATAGA CCT 



1 3 



< 2 ) INFORMATION FOR SEQ ID 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U ba*c pairs 
( B ) TYPE: BDclctc acid 
( C ) STRANDEONESS: ihgle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( a I ) SEQUENCE DESCRIPTION: SEQ CD NO: 174: 

AT AGGGTGAT AGA 



1 3 



( 2 ) INFORMATION FOR SEQ CD H&.V1S: 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 12 owe pain 
( B ) TYPE: nadeic add 
( C ) STRAND EDNESS: thglc 
( D ) TOPOLOGY linear 
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( ! I ) MOLECULE TYPE: D.VA (probe) 
( x I ) SEQUENCE DESCRIPTION: SEQ CD N0:175: . 
TOGTTAATAq.CC 



( 2 ) INFORMATION FOR SEQ CD 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 bue pain 
( B )TYPE-BocIe!c acid 
( C ) STRANDED NESS: .b^le 
( D) TOPOLOGY: Imoi 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:176: 

GTGAGTGGTT A A T 



( 2 ) INFORMATION FOR SEQ CD NCkl77; 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bi*c pain 
( B ) TYPE* nodrfc »c!d 
( C ) STRAND EDNES5: ihglc 
( D ) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO: 177: 

TOTOCCGCAT AT 



( 2 ) INFORMATION FOR SEQ ED N0:178: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 ba« pairs 
( B) TYPE: nucleic tcld 
( C ) STRANDEDNES5: single 
( D ) TOPOLOGY: linear 

( J I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:l78: 

ACTCTTGTCC GG 



( 2 ) INFORMATION FOR SEQ ID NO: 179: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 1J buepaln 
( B)TYPE:aoc!eictcid 
( C ) STRANDEDNESS: »mg}c 
( D ) TOPOLOGY: linear 

( 1 i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:179: 

T AGCAC TCTT CTG 



( 2 ) INFORMATION FOR SEQ CD NOrLSG 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: Ubuepaln 
( B ) TYPE: aodfie Kid 
( C ) STRANDEDNESS: ib^e 
( D) TOPOLOGY: Caeaf 

( f ! ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOH80: 
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OGAGAOTACC ACT 



( 2 ) INFORMATION FOR SEQ CD NOtlSl:. 

(I ) SEQUENCE CK^CTCRISTTCS: , / 
( A ) LENGTH: 12 base pairs * 
( B ) TYPE: nodeic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:lSl: 

CCCACCACAG TA 



( 2 ) INFORMATION FOR SEQ ID NO:l8£ 

( ! ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 hue pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:lS2: 

CCCAGCGACG A 



( 2 ) INFORMATION FOR SEQ ID NO: 183: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ CD N0:I3J: 

GGCCCCGAGC 



( 2 ) INFORMATION' FOR SEQ ID NO:lW: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 base pairs 
( B ) TYPE: nucleic sctd 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NOU84: 

TTATGGGCCC G 



( 2 ) INFORMATION FOR SEQ ID N0:185: 

< I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ CD N0:1S5: 

AGTGTTATGG GC 



( 2 ) INFORMATION FOR SEQ CD NOrtfo: 
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( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH; 12 bttc pain 
{ B ) TYPE: nucleic Kid 
< C ) STRANDEDNESS: amjle . 
/ ( P )TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NOU86: 

TACCCCCAAG TC 



( 2 ) INFORMATION FOR SEQ ID NO:lS7: 

( ! ) SEQUENCE CHARACTERISTTCS: 
( A ) LENGTH: 12 bu< pain 
( B ) TYPE: nocteic acid 
( C ) STRANDEDNESS: th#t 
( D ) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: D.VA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ED NO:lS7: 

TTTACCTACC CC 



( 2 ) INFORMATION FOR SEQ ID NO:l8& 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pairs 
( B)TYPE:nacleieacid 
( C ) STRANDEDNESS: ira^te 
( D) TOPOLOGY: linear 

( I i ) MOLECULE TYPE: D.VA (probe) . 



( * I ) SEQUENCE DESCRIPTION: SEQ ID NO:tS& 
TTCACTTTAG CTA 



( 2 ) INFORMATION FOR SEQ ID NOttSfc 

< I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U biae pain 
( B ) TYPE: nodeic acid 
( C ) STRANDEDNESS: stagfe 
( D ) TOPOLOGY: linear 

( J i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOU89: 

T AC AGTTCAC TTT 



( 2 ) INFORMATION FOR SEQ CD NCfctSG 

(1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 baae pain 
( B)TYP£aadctc acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

(II) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO490: 

. TCGAGATACA GTT 



( 2 ) INFORMATION FOR SEQ CD NOUSL* 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 baae pain 
{ B ) TYPE: locfcic add 
( C ) STRANDEDNESS: ahgte 
( D ) TOPOLOGY: Unear 
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( 1 I ) MOLECULE TYPE: DNA (probe) 
. ( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:l9l: 
; .CAOAT C TC G A GAT ""''*.. . * [ 



( 2 ) INFORMATION FOR SEQ CD N0:192: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bu< pain 
( B ) TYPE: nucleic acid 
( C ) STRAND EDNESS : single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION; SEQ ID NOU92: 

AGGAACCAGA TG 



( 2 ) DEFORMATION FOR SEQ ID N0:193: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 bwc pain 
( B ) TYPE: qdcWc acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID N0:193: 

GAACTACCAA CCA 



( 2 ) INFORMATION FOR SEQ ID NO:194: 

( I ) SEQUENCE CHARACTERISTICS: 
{ A ) LENGTH; 13 bajc pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
< D) TOPOLOGY: lino/ 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO-.194; 

GACTGT A ATG TGC 



( 2 ) INFORMATION FOR SEQ ID N0:195: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B ) TYPE: noclelc acid 
( C ) STRANDEDNESS: single 
( D )TOPOLOCY: linear 

( I I ) MOLECULE TYPE; DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ W NO: 19 5: 

OOCATTTCAC TGT 



( 2 ) INFORMATION FOR SEQ ID NO: 19(5: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: oodcic add 
( C ) STRANDEDNESS: s'mgfe 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ DD NO:196: 
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AGGOATTTGA CT 



( 2 ) INFORMATION" FOR SEQ CD NO: 19 7: 

; (. i )SEQt^CE.CKARACTERISTICS; 

( A ) LENGTH: 12 base pain 
( B) TYPE: nucleic acid 
( C ) STRANDEDNESS: atagle 
( D)TOPOLOCY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION; SEQ IDNO:197: 

ACGACAACGC AT 



( 2 ) INFORMATION FOR SEQ ID NO:19& 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B )TYPE: nucleic acid 
{ C ) STRANDEDNESS: i Ingle 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID N0:198: 

TGGGG ACGAG AA 



( 2 ) INFORMATION FOR SEQ tD NO: 199: 

(1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic acid 
< C ) STRANDEDNESS: single 
( D) TOPOLOGY: tinea/ 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ED N0.199: 

ATCCATGGGG AC 



( 2 ) INFORMATION FOR SEQ ID NO:200t 

( 1 ) SEQUENCE CHARACTERISTICS: 
< A ) LENGTH: 12 base pairs 
( B ) TYPE: nodclc acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:2C!Q: 

GGTCATCCAT GG 



( 2 ) INFORMATION FOR SEQ CD NO:20L* 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: tl base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO301: 

AGGGGGGTCA T 



( 2 ) INFORMATION FOR SEQ CD NO:202: 



Ill 
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( I ) SEQUENCE CHARACTERISTICS: 
< A ) LENGTH: 12 base pain 
( B ) TYPE: nodeic acid 
( C ) STRANDEDNESS: single 
( 0 )TOPOL0GY: linear . 

< I I ) MOLECULE TYPE DVA (probe) 

. ( x I ) SEQUENCE DESCRIPTION: SEQ ID XOHQZt 

TATCTCAGGG GC 

( 2 ) DEFORMATION" FOR SEO CD NO:20J: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 baac pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: sh^te 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NO--203: 

ACCCCTATCT G A 

( 2 ) INFORMATION FOR SEQ ID NO-J04: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 base pain 
( B ) TYPE: ondeic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO-.204: 

AGGGACCCCT A 



( 2 ) INFORMATION FOR SEQ CD NO:205: 

( ( ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 but pain 
( B) TYPE; nocUIc Kid 
( C } STRANDEDNESS: tinge 
( D) TOPOLOGY: linear 

( 1 i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO-JOS: 

TGGTCAAGCG AC 



1 2 



( 2 ) INFORMATION FOR SEQ ID NO:206: 

( I ) SEQUENCE CHARACTERISTICS: 

( A) LENGTH: 12 b*w pain - 
( B ) TYPE: Bodde add 
( C ) STRANDEDNESS: sb^te 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-JOS: 

GGATGGTGGT CA I 2 



( 2 ) INFORMATION FOR SEQ CD NO-J07: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bue pain 
( B) TYPE; oodelc acid 
( C ) STRANDEDNESS: stn&Ic 
( D ) TOPOLOGY: linear 
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( I I ) MOLECULE TYPE: DNA (probe) 
( x I ) SEQUENCE DESCRIPTION: SEQ IDNO-JOT: 
- AOGATCOTGC :tC 



( 2 ) INFORMATION FOR SEQ ID NO:208: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B )TYP£ Bncleic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: Uaear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION; SEQ ID NO:2C& 

ACACCOACCA TO 



( 2 ) INFORMATION FOR SEQ ID KOOOh 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B )TYPE:aodc!eadd 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ID NO:209: 

TCATTTACAC GG 



( 2 ) INFORMATION FOR SEQ ID NU210: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: D base pain 
( B ) TYPE: oocteic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: Ihear 

( i I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO£10: 

CCCATATTOA TTT 



( 2 ) INFORMATION FOR SEQ © N03IL- 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 12 base pain 
( B ) TYPE: Qodclc Kid 
( C ) STRANDEDNESS: single . 
( D ) TOPOLOGY: Uncu 

( II ) MOLECULE TYPE: DNA (probe) 

< i I ) SEQUENCE DESCRIPTION: SEQ ID NOSH: 

GTGGCATTTG GA 



( 2 ) INFORMATION FOR SEQ ID N0:212: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 base pairs 
( B )TYP£nocIcIc acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: tineas 

( I I ) MOLECULE TYPE: DNA (probe) 

< * I ) SEQUENCE DESCRIPTION: SEQ ID N0212: 
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ACCCCTCCCA T 

( 2 ) INFORMATION FOR SEQ ID NO:2I3: . 

. ( i j SEQUENCE CHARACTERISTICS: . 
( A. ) LENGTH: 11 base pairs 
< B ) TYPE: nucleic acid 
( C ) STRANDEDNESS; single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ IDNO:2lJ: 

COTCAGGOCT G 



( 2 ) INFORMATION FOR SEQ ID NO:2L4: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: oodcic scid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

< ! I ) MOLECULE TYPE: D.VA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO-JU: 

AGTGGGTGAG GG 



1 2 



( 2 ) INFORMATION FOR SEQ ID NO:2t5: 

( I ) SEQUENCE CHARACTERISTICS: 
, < A ) LENGTH: U base pairs 
( B ) TYPE: nucleic >cid . 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:2L5: 

GTATCCTAGT COG 



< 2 ) INFORMATION FOR SEQ ID NO:2t6: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CDNO-.2W: 

TT.TG TTGGTA TCC 13 



( 2 ) INFORMATION FOR SEQ ID N0217: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pairs 
( B ) TYPE: oodcloctd 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: (hear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NCn2l7 : 

GTAGGTTTGT TGG 13 



( 2 ) INFORMATION FOR SEQ CD NO-.21& 
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( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B ) TYPE; nucleic add 
( C ) STRANDEDNESS: * 'mtfc 
(D)TOPOLOGY: linear 

(II )MOlECULETYPE : DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID N0:218: 

TCGGTACGTT TG 



1 2 



( 2 ) INFORMATION FOR SEQ ID N0:219: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 buc pain 
( B ) TYPE: nedeie acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: tinea r 

( I I ) MOLECULE TYPE: DNA (probe) 

( ji 1 ) SEQUENCE DESCRIPTION: SEQ ID N0:219: 

TAAGGGTGGG TA 



l 2 



( 2 ) INFORMATION FOR SEQ ID NOr220: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 bajc pairs 
( B ) TYPE: nucleic ictd 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

(1 I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:22fc 

GTACTGTTAA GGG 

( 2 ) INFORMATION FOR SEQ ED N032L* 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: H base pain 
( B)TYPE-nodc;c*ctd 
( C ) STRANDEDNESS: sin $!e 
(D)TOPOLOGY: Imear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NOi22l: 

T GT ACT ATGT ACTG 



( 2 ) INFORMATION FOR SEQ ID NO-J22: 

( I ) SEQUENCE CHARACTER tSTTCS: 
( A ) LENGTH: 13 base pain 
( B)TYP£;nacIeIc acid 
( C ) STRANDEDNESS: sm^Ie 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

(it) SEQUENCE DESCRIPTION: SEQ ID NO*J22: 

GGCTTTATGT ACT 



1 3 



( 2 ) INFORMATION FOR SEQ ID N022J: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pafn 
( B ) TYPE: node* acid 
( C ) STRANDEDNESS: sialic 
( D ) TOPOLOGY: linear 
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( 1 I ) MOLECULE TYPE: DNA (probe) 
( x I ) SEQUENCE DESCRIPTION: SEQ ID NO£2J:. , 
AAATOGCTTT .. AT ':' . : : 



( 2 ) INFORMATION FOR SEQ CD NO:224: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 hue pain 
( B ) TYPE: nucleic icld 
( C ) STRANDEDNESS: im^le 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:224: 

GGTAAATGGC TT 



( 2 ) INFORMATION FOR SEQ CD NO:225: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B ) TYPE: oncUIc Kid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:225: 

TCTACO G T A A ATG 13 



( 2 ) INFORMATION FOR SEQ CD NO:22& 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: sin^e 
( D) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:226: 

GTGCTA ATG T ACG 13 



( 2 ) INFORMATION FOR SEQ CD NO:227: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U Use pain 
( B ) TYPE: Bodeie Kid 
( C ) STRANDEDNESS: lingje 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOS27; 

T A AT GTGCTA ATG 13 



( 2 ) INFORMATION FOR SEQ CD NO:22S: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 bue pain 
( B) TYPE: BodcIc Kid 
( C ) STRANDEDNESS: lingte 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 



( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-.22S: 
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CATCCGCACC G 



( 2 ) INFORMATION" FOR SEQ ID NO:22fc 

( i ) SEQUENCE CrIARACTERISTICS: 

( A ) LENGTH: 12 base pain . 
( B ) TYPE: nocleic acid 
( C ) STRANDEDNESS; single 
( D) TOPOLOGY: linear 

(II) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION; SEQ Q> NO:229: 

TGTAAGCATG GG 



( 2 ) INFORMATION FOR SEQ ID NO:23Ct 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH; 13 base pain 
( B )TYPE: oodclcaod 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:230: 

TTCCTTGTAA CCA 



( 2 ) INFORMATION FOR SEQ ID NO:2JU 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH:- 13 base pairs 
( B )TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID N0231: 

TGTACTTGCT TGT 



( 2 ) INFORMATION FOR SEQ ID NO:232: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B)TYP£ sodclcscid 
( C ) STRANDEDNESS: sbgle 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NO-.232; 

TTGCTG T ACT TGC 



( 2 ) INFORMATION FOR SEQ ED NO: 23 3: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B )TYP£:QacIelcscW 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO-.2J3: 

CGTTC ATTGC TG 



( 2 ) INFORMATION FOR SEQ ID NCh23*: 



5,837,832 

123 

-continued 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic add 
.(C) STRANDEDNESS: single 
(D)TOPOLOOY:Iincax 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO:2J4: 

TTOAGCCTTG AT 



( 2 ) INFORMATION FOR SEQ Q> NO:235: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B ) TYPE: nodcie acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD N023J: 

GTGATAGTTG AGG 



( 2 ) INFORMATION FOR SEQ ID 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

. . ( * i ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOSJtf: 

TTGATGTGTG ATA 



( 2 ) INFORMATION FOR SEQ ID NO-.237; 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: D base pain 
( 8)TYP&noctcicacid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:2J7: 

TGC AG TTGAT GTG 



( 2 ) INFORMATION FOR SEQ CD NO: 23 8: 

(I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B) TYPE: Docltic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:238: 

TCCACTTGCA GT 



( 2 ) INFORMATION FOR SEQ CO N0:239: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 ba*t pain 
( 8 ) TYPE: nucleic add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 
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( I I ) MOLECULE TYPE: DNA (probe) 
( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO:239: 
. AT TTGGAGTT GC : : 



( 2 ) INFORMATION FOR SEQ CD NO:240: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH; U base pain 
( B ) TYPE: oodcic add 
( C ) STRAND EDNESS : sm^le 
(D)TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:240: 

TACCGTACAA TAT 



< 2 ) INFORMATION FOR SEQ CD NO:241; 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
(B)TYPE; aodelc scfd 
( C ) STRAND EDNESS: single 
( D) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO:24l: 

TCCTACCCTA CAA . 



( 2 ) INFORMATION FOR SEQ CO NO:242: 

( I ) SEQUENCE CHARACTERISTICS: 

< A ) LENGTH: U base pain 
( B ) TYPE: nucleic acid 

< C ) STRANDEDNES5: sialic 
( D) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ CD NO:242: 

TATTTATCGT ACC 



( 2 ) INFORMATION FOR SEQ CD NO:24J: 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: U base pain 
( B )TYPE: aoctclcadd 
( C ) STRAND EDNESS: single - 
( D ) TOPOLOGY: linear 

(II) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ CD NO:24J: 

GGTCAAGTAT TTA 



( 2 ) INFORMATION FOR SEQ CD NO-J44: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 bue pain 
( B ) TYPE: Doctilc acid 
( C ) STRANDEDNESS: wntfe 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOt2U: 
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TACAGGTGGT C A A 



( 2 ) INFORMATION FOR SEQ ID NO:245: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U base pain 
( B ) TYPE; nucleic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( ! I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ED NO-.245: 

ATGTACTACA GGT 

( 2 ) INFORMATION FOR SEQ CD NO:2-W: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pin 
( B ) TYPE: nucleic acid 
( C ) STRAND EDNESS; single 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION': SEQ tOKOOU: 

CCTTTTTATC TAC 

( 2 ) INFORMATION FOR SEQ ID NO-.247: 

( 1 ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 12 base pain . 
( B) TYPE: nucleic acid 
( C ) STRAND EDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:247: 

GCATTCGGTT TT 



1 3 



1 3 



1 2 



( 2 ) INFORMATION FOR SEQ ID NO:248: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: ascitic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: Unev 

(II) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO--24S: 

TOTAOCATTC CO 



1 2 



( 2 ) INFORMATION FOR SEQ CD NO-.249: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: sbgle 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-.249: 

GTTTTCATGT AGG 



1 3 



( 2 ) INFORMATION FOR SEQ ID NO:25Ct 
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( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH,* 12 base pain 
( B ) TYPE: nucleic add 
■(C) STRANDEDNESS: single" 
; ( D ) TOPO LOGY: I bear , , - ' 

( r ! ) MOLECULE TYPE: DXA (probe) : 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:250: 

CGGTTTTCAT CT 



( 2 ) INFORMATION* FOR SEQ CD NO:25i; 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 11 buc pain 
( B )TYP£ aoclclc add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( ! I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-251: 

GGAGGGGGTT T 



( 2 ) CNFORMATTON FOR SEQ CD NOrlii 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH; 13 bi»c pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS; single 
( D) TOPOLOGY: linear 

(II ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-252: 

GTCAATACTT GGG 



( 2 ) CNFORMATTON FOR SEQ CD NO:25J: 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 13 bajepain 
( B )TYPE noddcacid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I i ) MOLECULE TYPE: D.VA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CDN0353: 

GGGT GAGTCA ATA 



( 2 ) INFORMATION FOR SEQ CD NOJ54: 

. ( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B) TYPE: nucleic acid 
. ( C ) STRANDEDNESS: ahgte 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:254: 

TGGGTGAGTC A A 



( 2 ) INFORMATION FOR SEQ CD NO-255: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B) TYPE* nodefc add 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 
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( i i ) MOLECULE TYPE: DNA (probe) 
( x I ) SEQUENCE DESCRIPTION: SEQ © NO:255: 
TO TTCATGGG T G 



( 2 ) DEFORMATION FOR SEQ ID NO:256: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: Boclclc tcid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DXA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID 

CGGTTGTTGA TG 



( 2 ) INFORMATION FOR SEQ ED NO-.257; 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U bw pain 
( B ) TYPE: Bodeic *cid 
( C ) STRANDEDNESS: jingle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:2S7: 

ACATAOCOCT TG 



( 2 ) INFORMATION FOR SEQ ID NO:25& 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: D ba« pain 
( B ) TYPE ascitic acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOCY: linear 

( I i ) MOLECULE TYPE: DXA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ID NO:258: 

CAAAATACAT AGC 



( 2 ) INFORMATION FOR SEQ ID NO:2S9: 

< I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U base pain 
( B ) TYPE: aoclclc acid 
( C ) STRANDEDNESS; »m$!c 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DXA (probe) 

< x I ) SEQUENCE DESCRIPTION: SEQ ID NO-.259: 

AATGTACCAA AAT 



( 2 ) INFORMATION FOR SEQ ID NO:26*0: 

< I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B ) TYPE: Boctclc add 
( C ) STRANDEDNESS: sbgle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DXA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ID NO260: 
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OCACTAATCT ACC 



( 2 ) INFORMATION FOR SEQ DO NO 231: 

; ( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH. 12 bi« pain 
( B )TYPE:nodcic acid 
( C ) STRANDEDNESS: liable 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ DD NO:2SI: 

TOCCTGCCAG TA 



( 2 ) INFORMATION FOR SEQ ID NO-.262: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH; 12 bue pain 
( 8 ) TYPE: aodeic >cid 
( C ) STRAND EDNESS: tingle 
( D) TOPOLOGY linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID Mfc2S2: 

TCATGGTGGC TG 



( 2 ) INFORMATION FOR SEQ ID NO:2M: 

(I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U ba*e pain 
( B ) TYPE: aodeic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NO:26J: 

AC A AT ATTCA TGG 



( 2 ) INFORMATION FOR SEQ ID NO:26*4: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH; D but pain 
( B )TYPE-Qw:!e!c xtd 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ CD NO£64: 

TACAATCTTA OCT 



( 2 ) INFORMATION FOR SEQ ID N036S: 

< I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U ba»e pain 
( B )TYPE:nod<ic Kid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: lhar 

( ! I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO:265: 

TTTAA ATTAG AAT 



( 2 ) INFORMATION FOR SEQ CD NO:26«: 
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( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U base pain 
/ ( B ) TYPE: nucleic acid 

( C ) STRANDEDNESS; smgle 
( D)TOPpLOGY: linear 

; (I ' ) MOLECULE TYPE: DNA (probe) - 

( x ! ) SEQUENCE DESCRIPTION: SEQ ID NO:266: 

CAATAAOTTT AAA 



( 2 ) INFORMATION FOR SEQ CD NO:267 : 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( I i ) MOLECULE TYPE: DNA (probe) 

( * 1 ) SEQUENCE DESCRIPTION: SEQ CD NO:267: 

CAACACACAA TAA 



( 2 ) INFORMATION FOR SEQ CD Mh2flfc 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH 13 base pain 
( B)TYPE:nocleic»cid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

(1.1) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NCh2«: 



( 2 ) INFORMATION FOR SEQ CD NO:2«9: 

( I ) SEQUENCE CHARACTERISTICS: 
< A ) LENGTH: 12 base pain 
( B ) TYPE- oodelc acid 
( C ) STRANDEDNESS: single 
. ( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NCh269: 

CCCATGAAAG AA U 



( 2 ) INFORMATION FOR SEQ CD NO-.27Q: 

( I ) SEQUENCE CHARACTERISTICS: : 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-.270: 

TTCCC CATGA AA 12 



( 2 ) CVFORMATtON FOR SEQ CD NO£71: 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 12 base pain 
( B) TYPE: oodcic acid 
( C ) STRANDEDNESS: single 
( D)TOPOLOGY; linear' 
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( ! i ) MOLECULE TYPE: DNA (probe) 
(x I )SEQl^NCEDESCRrPnON:SEOIDNO:271: .. . 
ATCTGCTTCC CC '['■'■/ 



( 2 ) INFORMATION' FOR SEQ CD NOt271 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B )TYP&nncl«i<actd 
( C ) STRANDEDNESS: single 
< D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:272: 

CAAATCTGCT TC 



( 2 ) INFORMATION FOR SEQ ID NO:27J: 

( ! ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( i I ) MOLECULE TYPE* DNA (probe) 

( * 1 ) SEQUENCE DESCRIPTION: SEQ ID NO:273: 

CGTACCCAAA TC 

( 2 ) CVFORMATTON FOR SEQ CD NO:274: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B) TYPE: aoclele acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ID NO:274: 

GGTGGTACCC AA 



( 2 ) INFORMATION FOR SEQ CD NO-.275: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B ) TYPE: aoclele add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY linar 

( I I ) MOLECULE TYPE: DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ CD N0275: 

TACTTGGGTG GT 



( 2 ) INFORMATION FOR SEQ CD KO-.V6: 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 12 base pain 
( B) TYPE: nucleic scid 
( C ) STRANDEDNESS: tbgte 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOi27tf: 
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( 2 ) INFORMATION FOR SEQ ID NO:277; . . 

(I) SEQUENCE CHARACTER tSTTCS: . 

(A)LENCTCt2ba*epairs 
( B ) TYPE: nucleic acid ' 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear - 

( i I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION; SEQ ID NO:277: 

CTCCTTGGAA A A 



( 2 ) INFORMATION FOR SEQ ID NO:278: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH* 12 base pain 
( B ) TYPE: oodclc acid 
( C ) STRANDEDNESS: sialic 
( D) TOPOLOGY: Uoear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO:278: 



( 2 ) INFORMATION FOR SEQ ID NO:279: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH; 13 base pain 
. .( B ) TYPE; nucleic acid 

( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO:279: 



( 2 ) tNFORMATTON FOR SEQ CD NO:2S0t 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pairs 
( B )TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:2S0: 

TTTTTCTCTG ATT 



( 2 ) INFORMATION FOR SEQ ID N0-.2SI: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 base pain 
( B ) TYPE: Boctelc add 
( C ) STRANDEDNESS: tingle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ID NO:28l: 

T A AAG ACT TT TTC 13 



( 2 ) INFORMATION FOR SEQ ID NO-232; 
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( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: U base pain 
( B ) TYPE: ooclcic tctd 
(C)5TRANDEDNESS:ain£lc : 
. . (D)TOPOLOOY: linear . 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:2S2: 

GTGCACTTAA AC A 



( 2 ) INFORMATION FOR SEQ ID NO:28J: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 buc pain 
( B ) TYPE; aodclc Kid 
( C ) STRANDEDNESS: ibglc 
(D)TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE; DNA (probe) 

( * 1 ) SEQUENCE DESCRIPTION: SEQ Q> NO:2S3: 

TCGTCC AGTT AAA 



( 2 ) INFORMATION' FOR SEQ ID NO-2S4: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B) TYPE- nucleic acid 
( C ) STRANDEDNESS: atngle 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NO:2«: 

TGCTA ATGGT GG 



( 2 ) INFORMATION FOR SEQ ID NO:285: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ CD NO-2S5: 

TTGGGTG CTA AT 



( 2 ) INFORMATION FOR SEQ ID N02W: 

(I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: L? base pain 
( B) TYPE* BBctelc acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NO-.2S6: 

TAGCTTTGGG TG 



( 2 ) INFORMATION FOR SEQ CD N0287: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B) TYPE- oadeic add 
( C ) STRANDEDNESS: single 
( D)TOPOLOGY: linear 
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( I i ) MOLECULE TYPE: D.VA (probe) 
. ( x | ) SEQUENCE DESCRIPTION: SEQ [DNO:287: 

TCTTAOCTTT GC . - : : ■''..-'['■■.. 

( 2 ) INFORMATION FOR SEQ CD NO:23S: 

( I ) SEQUENCE CHARACTERISTICS: 
( A)LENGTH:22ba«p 3 W 
C B) TYPE: aocklc Kid 
( C ) STRANDEDNESS: liable 
( D) TOPOLOGY: (bar 

( 1 I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:28S: 

CACTTCTGCC CTGACTTTCA AC 

( 2 ) INFORMATION' FOR SEQ ID NOtSfc 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 49 bucpixn 
( B)TYP£:BDc(eIe>cid 
( C ) STRANDEDNESS: im^te 
(D)TOPOLOGY:Imev 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO:2S9: 

ATGCAATTAA CCCTCACTAA AGCCAGACAC TTCTCCCCTC ACTTTCAAC 49 

( 2 ) INFORMATION FOR SEQ ID NO:290t 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 ba« pain 
( B ) TYPE: nucleic xJd 
( C ) STRANDEDNESS: $h#t 
( D ) TOPOLOGY: Imor . 

( I ! ) MOLECULE TYPE: D.VA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ID NO:2SO: 

GACCCTGGGC AACCACCCCT GTCGT 2J 

( 2 ) INFORMATION FOR SEQ ID NCH2SL- 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 47 6«sc pain 
( B )TYPE:nocWc *fd 
( C ) STRANDEDNESS: single 
(D) TOrOLOGY: linear 

.(II) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID KO£9l: 

TAATACGACT CACTATACCO AGG ACCCTGG GCAACCAGCC CTGTCGT 47 

( 2 ) INFORMATION FOR SEQ ED NO:292: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 25 base pain 
( B) TYPE: aodfic Kid 
( C ) STRANDEDNESS: ih#c 
( D ) TOPOLOGY: linear 

( I I )MOLECULETYP& D.VA(prob<) 



( x I ) SEQUENCE DESCRIPTION: SEQ © NO-J91 
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CTAGAATTCT CTTCACTCAC ATTCC 



( 2 ) INFORMATION FOR SEQ CO NO:29 J: 

( I ) SEQUENCE CHARACTERISTICS: , 
( A ) LENGTH: 27 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DXA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ 05 N0393: 

AAATCCATAC AATACTCCAO TATTTCC 



( 2 ) INFORMATION FOR SEQ CD MM?* 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 27 bw pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: liable 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DXA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD N0294: 

OATAACCTTC CCCCTTATCT ATTCCAT 



( 2 ) INFORMATION FOR SEQ ID NO:295: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 23 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I ! ) MOLECULE TYPE: DXA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD >'Ot29S: 

ACCCATCCAA ACCAATCOAC GTTCTTTC 



{ 2 ) INFORMATION FOR SEQ ID NQ-.296: 

( I ) SEQUENCE CHARACTERISTICS: 
C A ) LENGTH: 12 base pain 
( B ) TYPE* nucleic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (oligonucleotide) 

( x I ) SEQUENCE DESCRIPTION: SEQ CO NO£96: 

AGCCTAGCTG A A 



< 2 ) INFORMATION FOR SEQ CD NO-297: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS; single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE* DXA (oligonucleotide) 

( « I ) SEQUENCE DESCRIPTION: SEQ CD NOJ97: 

TCCGATCGAC TT 



( 2 ) INFORMATION FOR SEQ CD fiOOS* 
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( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 72 b»*c pain 
( B)TY?E: nucleic acid 
( C ) STRANDEDNESS: single 
( D )TOPOLOGY: linear 

. ( I I ) MOLECULE TYPE: D.VA (probe) 

( i I ) SEQUENCE DESCRIPTION: SEQ tD N029S: 

CCOAATTAAC CCTCACTAAA GC 

( 2 ) INFORMATION FOR SEQ tD NO:299: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 22 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: D.VA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ID NO-.299: 

A AT T A ACCCT CACTAAAGGG AG 



( 2 ) INFORMATION FOR SEQ © NO-J00: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 22 base pain 
( B ) TYPE: coclelc acid 
.(C) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

(II ) MOLECULE TYPE: D.VA (probe) 

( u I ) SEQUENCE DESCRIPTION: SEQ ID NO-J00: 

TAATACGACT CACTATACCO AG 



( 2 ) INFORMATION FOR SEQ ID NO-JOl: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH- 20 base pain 
( B ) TYPE: aoclcte acid 
( C ) STRANDEDNESS: smgte 
( D ) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION; SEQ ED NO JOt: 

AT TT ACGTGA C A CT AT AG A A 20 



( 2 ) INFORMATION FOR SEQ ID NO-J02: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 base pain , 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: smgic 
( D ) TOPOLOGY: Ikcu 

( I I ) MOLECULE TYPE: DNA (pcobe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-J02; 

GATNATATTT 10 



( 2 ) INFORMATION FOR SEQ ID NO-JOJ: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 base pain 
( B ) TYPE: Boddc acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: Ihear 
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( I I ) MOLECULE TYPE: DNA (probe) 
( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NO:30J: 
• ' A 6 A N C A T A T T ' ' 



( 2 ) INFORMATION FOR SEQ ID NOJ04: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 base pain 
( B ) TYPE: onelele *cfd 
( C ) STRANDEDNESS: sbgle 
( D) TOPOLOGY: liacar 

( ! i ) MOLECULE TYPE: DNA (probe) 

( i i ) SEQUENCE DESCRIPTION: SEQ ID N0:J04: 

AACKTCATAT 



( 2 ) INFORMATION FOR SEQ ID NO-J05: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 base pair* 
( B ) TYPE: oodclc Kid 
( C ) STRANDEDNESS: j'm^le 
( D ) TOPOLOGY: liacar 

( I i ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NOJ05: 

A A A N AT C A T A 



( 2 ) INFORMATION FOR SEQ CD NOJ06: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 base pain 
( B ) TYPE: aoelefc acid 
( C ) STRANDEDNESS: sialic 
(D)TOPOLOCY:lmeir 

( ! i ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NOJ06: 

CAANOATCAT 



( 2 ) INFORMATION FOR SEQ CD NOJ07: 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: lObue pain 
( B)TYP£ sodclcaclrf 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: {men 

( I I ) MOLECULE TYPE: DNA (probe) 

< * I ) SEQUENCE DESCRIPTION: SEQ CD MM07; 

CCAKACATCA 



( 2 ) INFORMATION FOR SEQ ED NO-JC& 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: lObue pain 
( B ) TYPE: aodcic »cid 
( C ) STRANDEDNESS: «bgle 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 



( a 1 ) SEQUENCE DESCRIPTION: SEQ ID NO-JOS: 
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ACCNAACATC 

( 2 ) CNFO&VCATION FOR SEQ CD NOO09: 

. ( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 10 bue pain ' " * 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOOY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ QD NOO09: 

CACKAAACAT 

( 2 ) INFORMATION FOR SEQ CD N0:310; 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 baae pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: tingle 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE* DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-J10: 

AC AAACNACA 



( 2 ) INFORMATION FOR SEQ ID NO-J1I: 

(I ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 16 buc pain .: 
( B ) TYPE: oucletc acid 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO J 11: 

ATTTCATTCT GTATTC 



1 6 



( 2 ) INFORMATION FOR SEQ CD NOOll 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 16 bate pain 
( B) TYPE: BDctele add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( t I ) MOLECULE TYPE: DXA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD N0012: 

CCGACTCCAG TCGTTA . 16 



( 2 ) INFORMATION FOR SEQ CD NOrJU: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 base pain 
( B )TYPE: n«Wc acid 
( C ) STRANDEDNESS: shgf e 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ CD NOUU: 

CCGACTCCAG TCGTT 15 



( 2 ) INFORMATION FOR SEQ CD NO-JU: 
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( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 13 bu« pain 
( B ) TYPE: nwldc acid 
.(C) STRANDEDNESS: abgle 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NOOU: 

CCCACTACAC TCOTT 



( 2 ) INFORMATION FOR SEQ CD N0-J15: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 baac pain 
( B) TYPE: Badele acid 
( C ) STRANDEDNESS: smgic 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE- DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO JLS: 

CCGACTCCAG TCGTT 



( 2 ) INFORMATION FOR SEQ ID N0-JI6: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 15 ba*e pain 
( B) TYPE: sadcic add 
( C ) STRANDEDNESS: ainde 
( D) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE dESCRIPTION: SEQ ID N0016: 

CCCACTTCAG TCGTT 



( 2 ) INFORMATION FOR SEQ ID N0-J17: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 35 base pain 
( B ) TYPE: aodek acid 
( C ) STRANDEDNESS: ararie 
( D ) TOPOLOGY: Knew 

( II ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO-J17: 

GT A ATTTCTT TTATAGTAGA AACCACAAAG GAT AC 



(2) INFORMATION FOR SEQ ID NO*Jl& 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 35 buc pain 
( B ) TYPE: Bodcic add 
( C ) STRANDEDNESS: abde 
( D ) TOPOLOGY Imar 

( I I ) MOLECULE TYPE: DNA (dlgoaodeoUde) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOOIS: 

CATTAAAGAA AATATCATCT TTGGTGTTTC CTATG 



( 2 ) INFORMATION FOR SEQ ED N0J19: 

( ! ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 32 base pain 
( 8 ) TYPE oodeSc add 
( C ) STRANDEDNESS: thgte 
( D ) TOPOLOGY: thai 
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( I 1 ) MOLECULE TYPE: DNA (oligonucleotide) 
. ( x 1 ) SEQUENCE DESCW7TTON: SEQ ID NO-JI9: . 
■CAT TAA AO A A • .A A T A T C A T TG . OJCTTTCCTA .TG ' 

( 2 ) INFORMATION FOR SEQ ID NOJ20: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: IS buc pain 
( B )TYPE*aodtic*cld 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: Uaor 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NOO20: 

CATTAAACAA AATATCAT 

( 2 ) INFORMATION FOR SEQ ID NO-J2L* 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 35 buepaln 
( B) TYPE: aorfcic Kid 
( C ) STRANDEDNESS: »mg!e 
( D ) TOPOLOGY: liacar 

( 1 1 ) MOLECULE TYPE: DXA (oligonucleotide) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:32l: 

T AT T A A A OA A A AT AT CAT CT TTGGTGTTTC CTATC 

( 2 ) tNFORMATiON FOR SEQ ID NOJ22: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 35 bwe pairs 
< B) TYPE: nucleic tctd 
( C ) STRANDEDNESS: ibgfe 
(D)TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: DNA (oligonucleotide) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NOJ22: 

CCTTAAAGAA AATATCATCT TTGGTGTTTC CTAAA 

( 2 ) INFORMATION FOR SEQ ID NO-J23: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 35 bue pain 
( B ) TYPE- nodefc »dd 
( C ) STRANDEDNESS: thglc 
( D) TOPOLOGY: (hear 

(II ) MOLECULE TYPE: DNA (oUgaaodeUde) . 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOOZJ: 

.CTTTAAACAA AATA A A AAA A TTGOTGTTTC CTAAA 



3 2 



3 5 



3 5 



3 5 



( 2 ) INFORMATION FOR SEQ ID NO-J24: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH; 20 b*M pain 
( B)TYPE:aod<icacid 
( C ) STRANDEDNESS: tingle 
( D ) TOPOLOGY; liacar 

( f I ) MOLECULE TYPE: DNA (probe) 



( x I ) SEQUENCE DESCRIPTION: SEQ ID NO-J24: 
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GGAAGTCTCC CATTTTAATT 



2 0 



( 2 ) INTORMATION FOR SEQ ID N0O25: . 

; ( I ) SEQUENCE CHARACTERISTICS: 

(A)LENGTK:20bas<:pai« - 
( B )TYPE nucleic acid 
( C ) STRAND ED NESS: jfa^le 
( D) TOPOLOGY: linear 

(II) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NOJ25: 

CCTTCAGAGG GTAAAATTAA 



2 0 



( 2 ) INFORMATION FOR SEQ ID N0O26: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 20 bue pain 
( B ) TYPE: nucleic acid 
( C ) STRAND EDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEO CD N0026: 

C CT T CA GAG K GTAAAATTAA 



2 0 



( 2 ) INFORMATION FOR SEQ ID NO:327: 



( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 20 base pairs 
( B ) TYPE: nucleic scid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 



(II) MOLECULE TYPE: DNA (probe) 
( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-J27: 
CCTTCAGAGT GTAAAATTAA 



( 2 ) INFORMATION FOR SEQ ID NO:328: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 19 base pain 
( B ) TYPE: cocleic »eld 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I t ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD N002& 

CCTTCAGAGG GT A A A A T C A ' 19 



( 2 ) INFORMATION FOR SEQ 0> NO:329: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 19 base pain 
< B ) TYPE: nucleic sctd 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: (hear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ CD NO-J29: 

CCTTCAGAGG GTAAAA TTA 19 



( 2 ) INFORMATION FOR SEQ ID NOOJ0: 
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( I ) SEQUENCE CHARACTERISTICS; 

( A ) LENGTH: 19 btsc pain 
( B) TYPE treclcic acid 

( C ) STRANDEDNESS: stable ... 
(0)TOPOLOGY:Uiwar ; . 

( I i ) M0LECU1£ TYI^ D^A (probe) : . ' * . , 

( x t ) SEQUENCE DESCRIPTION: SEQ tD NOO30: 
GATtCACAOT CTAAAATAC 19 



( 2 ) INFORMATION FOR SEQ ID NQ-J3U 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 19 biac pain 
( B )TYP£:aode:c Kid 
( C ) STRANDEDNESS: imgte 
( D ) TOPOLOGY: linear 

( I t ) MOLECULE TYPE: DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ID NO-J31: 

-• ✓ 
AAAAAACACT CTAAAATCA , " 19 



( 2 ) INFORMATION FOR SEQ ID KC032 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH- 35 baae pain 
( B )TYPE BBdelc acid 
( C ) STRANDEDNESS: sfngte 
( D) TOPOLOGY: Imear 

( I I ) MOLZCU^ TYPE: DNA (o%oaucl«tii«:) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-J32: 

C ATTA A AG A A A ATAAC ATCA TTGGTGTTTC CTATG 35 



( 2 ) INFORMATION FOR SEQ ID NO-JJJ: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 643 baae pain 
( B )TYPE:tx»cIcic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 



( I I ) MOLECULE TYPE- DNAColIgoQucIco tide) 
( x 1 ) SEQUENCE DESCRIPTION: SEQ ED NOJ33: 



AACAAACCTA 


CCCACCCTTA 


ACAGTACATA 


GTACATAAAG 


CCATTTACCG 


TACATAGCAC 


6 0 


ATTACAGTCA 


AATCCCTTCT 


CGTCCCCATG 


GATGACCCCC 


CTCAGATAGG 


GGTCCCTTGA 


1 2 0 


CCACCATCCT 


CCGTGAAAT C 


AATATCCCCC 


AC A A GAG TGC 


TACTCTCCTC 


GCTCCGGGCC 


180 


C AT A AC ACT T 


GGGGGTAGCT 


AAAGTGA ACT 


GTATCCGACA 


TCTCCTTCCT 


ACTTCAGGGT 


2 4 0 


CATAAAGCCT 


AAATAGCCCA 


CACGTTCCCC 


TTAAATAAGA 


CATCACGATG 


GATCACAGGT 


3 0 0 


CTATCACCCT 


ATTAACCACT 


CACGGGAGCT 


CTCCATGCAT 


TTGGT ATTTT 


CGTCTGGGGG 


3 6 0 


GTATOCACOC 


GATAGCATTG 


CGAGACGCTO 


GAGCCGGAGC 


ACCCT ATGTC 


GCAGTATCTG 


42 0 


TCTTTG ATTC 


CTGCCTCATC 


CTATTATTTA 


TCGCACCTAC 


GTT C A AT ATT 


ACAGGCGAAC 


4 8 0 


ATACTTACTA 


AAGTGTGTTA 


AT T AATT A AT 


GCTTGTAGGA 


CATAATAATA 


ACAATTGA AT 


5 4 0 


GTCTGC ACAG 


CCACTTTCCA 


CACACACATC 


AT A ACAAA A A 


ATTTCCACCA 


AACCCCCCCT 


6 0 0 


CTCCCCCCCT 


TCTCGCCACA 


GC ACTTA A AC 


ACATCTCTGC 


CAAACCCC 




648 



( 2 ) INFORMATION FOR SEQ ID NO-J34; 
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( I ) SEQUENCE CHARACTERISTICS: 
< A ) LENGTH 12 bue pain 
( B )TYPE: aocleic acid 
( C ) STRANDEDNESS: single . 
( D) TOPOLOGY: Ua«r 

( ' f ) MOLECULE TYPE: D>fA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ED NO-J34: 

CATCCTOACO AG 

( 2 ) INFORMATION FOR SEQ CD NO:335: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 buc pain 
( B)TYPE.&ac(elc acid 
( C ) STRANDEDNESS: ab&te 
( D) TOPOLOGY: Imear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NOJ35: 

CTCCTCCCCG GT 



1 2 



( 2 ) INFORMATION FOR SEQ ED NO*J3d": 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bue pain 
( B )TYPE:nocleic acid 
( C ) STRANDEDNESS: sin*!* 
( D) TOPOLOGY: linear 

(1 1 ) MOLECULE TYPE: DNA (probe) , 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NOJM: 

ACTCCTCCCC GG 



1 2 



( 2 ) INFORMATION FOR SEQ CD NO:337 : 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: noclcte acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY linear 

( i I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ED NO JJ7: 

GACTCCTCCC CO 12 



( 2 ) INFORMATION FOR SEQ ED NO-J3& 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bue pain 
( B )TYPE: codelcarfd 
( C ) STRANDEDNESS: sbgle 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

(il) SEQUENCE DESCRIPTION: SEQ ED NO-J3& 

CGACTCCTCC CC 12 



( 2 ) INFORMATION FOR SEQ EO NO-J39: 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 12 buepjin 
( B)TYP£ oodelc»eW 
( C ) STRANDEDNESS: sbgle 
( D ) TOPOLOGY: linear 
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( 1 i ) MOLECULE TYPE* DNA (probe) 
( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NO-J39: 
ACCACTCCTC CC : ':{•_'_ " . 



( 2 ) INFORMATION FOR SEQ ID NO-JMt 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( 1 I ) MOLECULE TYPE: D.VA (probe) 

( i I ) SEQUENCE DESCRIPTION: SEQ ID NOO40: 

TACGACTCCT CC 



( 2 ) INFORMATION FOR SEQ CD NOrWL- 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pairs 
( B ) TYPE ooddc add 
( C ) 5TRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO-J41: 

CTACGACTCC TC 12 



( 2 ) INFORMATION FOR SEQ ID NO*.34£ 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 12 base pairs 
( B ) TYPE: Bodelc add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (probe) 

(» I ) SEQUENCE DESCRIPTION: SEQ ID NOJ42: 

TCTACGACTC CT 12 



( 2 ) INFORMATION FOR SEQ Q> XO-Mh 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 bw pairs 
( B ) TYPE- noclclc add 
< C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE DNA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ED NO-J43: 

TTCTA CGACT CC 12 



( 2 ) INFORMATION FOR SEQ CO N&3U: 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 12 base pairs 
( B) TYPE nud.tc add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE DNA (probe) 



( * I ) SEQUENCE DESCRIPTION: SEQ CD N0344: 
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ATTCTACCAC TC 



( 2 ) INFORMATION FOR SEQ tt> ti&MS: . 

( 1 ) SEQUENCE CHAJUCTERISTICS: 

< A ) LENGTH 12 bwe pair, ^ 
( B ) TYPE- nodclc acid 
( C ) STRAND EDNESS: single 
( D ) TOPOLOGY: linear 

( I ! )M0L£CU1£ TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO-J45: 

TATTCTACGA CT 



( 2 ) INFORMATION FOR SEQ ID NOM-. 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH 12 base pain 
( B ) TYPE: nacleic acid 
( C ) STRAND EDNESS: alaglc 
( D) TOPOLOGY: Uaear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x ! ) SEQUENCE DESCRIPTION: SEQ ID NO-J46: 

CTATTCTACC AC 



( 2 ) INFORMATION FOR SEQ ID NO*-**7 : 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 12 base pain 
( B ) TYPE: nucleic acid 
< C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( J I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ tD NO-J47: 

CCTATTCTAC G A 



( 2 ) INFORMATION FOR SEQ ID NO-J48: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 base pain 
( B)TYPE:nocI«!c»cld 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I . ) SEQUENCE DESCRIPTION: SEQ ID NOJ4& 

TCCTCCCCGG 



( 2 ) INFORMATION FOR SEQ ID MW49: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH 10 base pain 
( B) TYPE: ADdctc add 
( C ) STRANDEDNESS: afcxgtc 
( D) TOPOLOGY: linear 

( I ! ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO-J49: 

CTCCTCCCCG 



( 2 ) INFORMATION FOR SEQ ID NOJ5Ct 
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( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 bu« pain 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: ihglc 
! ( D^tbroLOCK Uacv-. 

: " \( i I ) WOLECULE TYPE: D.VA (probe) 

( * I ) SEQUENCE DESCRIPTION: SEQ ED NO-J5& 

ACTCCTCCCC 

( 2 ) INFORMATION FOR SEQ Q> NO-J51: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 bue pain 
( B) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: D.VA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-J51: 

OACTCCTCCC 



1 0 



( 2 ) INFORMATION FOR SEQ ED NO*J5£ 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 buc pain 
( B )TYPE:noc!eic acid 
( C ) STRANDEDNESS: lingle 
( D) TOPOLOGY: linear 

. ( i . I ) MOLECULE TYPE: DNA (probe) . 

( x I ) SEQUENCE DESCRIPTION: SEQ ID N0032: 



( 2 ) INFORMATION FOR SEQ ID NO-J5J: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 buc pain 
( B ) TYPE nodcic acid 
( C ) STRANDEDNESS: single 
( D) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO J 5 J: 

ACGACTCCTC 



( 2 ) INFORMATION FOR SEQ ID NOJ5* 

( I ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH: 10 baM pain 
( B ) TYPE: nocfetc acid 
< C ) STRANDEDNESS: angle 
( D) TOPOLOGY: Imor 

( I I ) MOLECULE TYPE: DNA (probe) ■ 

( « I ) SEQUENCE DESCRIPTION: SEQ CD NOJ54: 

TACOACTCCT 10 



( 2 ) INFORMATION FOR SEQ CD NOJ55: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 baacpaln 
< B ) TYPE: noclele acid 
( C ) STRANDEDNESS: ihglc 
( D ) TOPOLOGY: tmev 
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( I 1 ) MOLECULE TYPE* DNA (probe) 
. ( x 1 ) SEQUENCE DESCRIPTION: SEQ CO NO-J55: ■ 
CTACCACTCC V' . \ ; . ";. . 



( 2 ) INFORMATION FOR SEQ CD N005& 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 bwe pain 
( B ) TYPE: Qodelc ictd 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linor . 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID N0O56: 

TCTACGACTC 



( 2 ) INFORMATION FOR SEQ ID NO-J57: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10 base pain 
( B) TYPE: noctcte add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linor 

( 1 ! ) MOLECULE TYPE: DNA (probe) 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO-J57: 

TTCTACCACT 



( 2 ) INFORMATION FOR SEQ ID NO-J5& 

< I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: lObase pain 
( B ) TYPE: BDdeic Kid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (probe) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-J5& 

ATTCT ACGAC 



( 2 ) INFORMATION FOR SEQ CD N0059: 

( I ) SEQUENCE CHARACTERISTICS; 

( A ) LENGTH: lObuepaln 
( B ) TYPE: oodilc tcld 
( C ) STRANDEDNESS: single 
<D)tOPOLOGY:Unew 

( I I ) MOLECULE TYPE: DNA (probe) 

( x 1 ) SEQUENCE DESCRIPTION: SEQ CD NO-J59: 

T ATTCTACG A 



( 2 ) INFORMATION FOR SEQ CD NO-J«0i 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: UW base pain 
( B ) TYPE: nodeic tod 
( C ) STRANDEDNESS: single 
(D)TOPOLOGY: linear 

( I I ) MOLECULE TYPE: DNA (ottgoooclcatldc) 

( x I ) SEQUENCE DESCRIPTION: SEQ CD NO-J60: 
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TACTCCCCTG CCCTCAACAA CATGTTTTCC CAACTGOCCA AOACCTCCCC TCTCCACCWQ ^ 
KGGGWWGATT CCACACCCCC GCCCGGCACC CGCGTCCGCG , CCATGCCCAT CTACAAGC AG l20 
TCACAOCACA TCACCGACOW;WGKCACCCOC TCCC^ 



SAYG 



18 0 
1 8 i 



We claim: 10 3 '-TAGTAGXAACCACAA (SEQ ID. N013V 

™ / 0 ^ oL S° nu 1L cIc ? tide Pntes immobilized on a 3'-AGTAGAXACCACAAA (SEQ ID No-14V 

more ZSSS? ,T 3t T T Pf0bCS aDd D ° 3'-GTAGAAXCCACAAAG SEQ £ NO* 15V 
more than 100,000 different oligonucleotide probes 9 to 20 3'-TAGAAAXCACAAAfifi }<2Pn m NrrJ t<\ ^ 

array, said oligonucleotide probes comprising at least four „^TT ^ (SE 9 ID * N0:17); whcrcin 

sets of probes: (1) a first set that is exactly complementary % * "L C f P'T PWbc * ^ X '* * divid ^ A, C 
to a reference sequence and comprises pWtfat co£ , ■ - 

pletely span the reference sequence and, relative to the . y ° fcIaun x » w nerem said reference sequence 

reference sequence, overlap one another in sequence; and (2) 20 s^ 00 * of a D *toop region of human mitochondrial 
three additional sets of probes, each of which is identical to £ _ 

said first set of probes but for at least one different - 10 v : P c . arra y of claim 9, wherein said probes are 15 
nucleotide, which different nucleotide is located in the same nucleotides in length, and said array comprises a first set of ' 
position in each of the three additional sets but which is a P^bes exactly complementary to a sequence contained in a 
different nucleotide in each set. 2J sequence bounded by positions 16280 to 356 of the refcr- 

2. The array of claim 1, further comprising a fourth cncc sequence and four additional sets of probes identical to 
additional set of probes, which fourth additional set is saidfi rstsetbutforposition7,relativetoa3*-endofaprobe 
identical to probes in the first set. which 3'-end is covaiently attached to the substrate, where' 

• j 7 T. array of cIaun w n c «in said reference sequence for each of the four additional probe sets, a different nucle- 
f iTu . ° uclcic add and P r °b« complementary 30 otide is located, such that, for each probe in said first set 

4 Ae ^ said array, . . .there is an identical probe. in one of the four additional sets,' ■ 

nuJe^ SS^^^ : ^ ^ ^ 10(W 

„„m J?.? "7*7 ° f u da 'T 4> Wherein 531(1 P robes are 15 ™« "«y of daira 1, wherein said reference seaueoce 

JS£?S " ff , .? d 8 C0V4leQt linka 8« to 3S is a sequence from an exon of a hum^gZ^ 

f^f^^ 3 -end of said probes, acd said different nucleotide 12. The array of claim H. wherein said reference 

6- array ofclaiml, wherein said reference sequence sequence from exon 6 of a p53 Bene 
B 5^ £SI t\? FTR V 0 *' said array has between 1000 13. The array of claim I 11, wherein said reference 
feni ' 000 0 ^^leotide probes 10 to 18 nucleotides in « sequence is exon 5 of a P 53 gen! TSi probes a« H 

7 Th ea rr»voFr],; m <c m k • ■ nucleotides long, and said array comprises a first set of 

of „!,! y • • ' ~ erCU1 Said m y «»»P™« a 561 P">°« «actly complementary to said sequence and at least 
eSCOm P ns i Q « as P ecificnu rf«'^se q uenceselected three additional sets of probes, each set comprising probes 

3'-tSgtxgaJ !S m 2 a,tached ,0 1,16 Substrate ' which oucleotide * diffe ™< &om 

JJEffiSSSSiSgS-SS® »^«^l^l-.con«^ pi ob. <rfliH 

3 3 SgSg^cSSS J?^ « 0f -f^ MM P^ are oli- 

i^TA^A7w£. £«pID. NO:307); . J0 godeoxynbonucleotides. . «■ 

i^JrA^r^!cSS N ° :308): IS-ThearrayofcIaimLwhereinsaidarrayhasbetween 
l'T^A i A^^. C(SEQID NO:309 ) ;a,,d 10.000 and 100.000 probes. ynas oerween 

^if^^t^ BX iB > : ^° :310): WhereiD " Ch 561 "The array of claim 1. wherein the reference sequence 
comprises 4 probes, and X is individually A. G. C, and T is from a human immunodeficiency virus. 

8%W^.,f..t • -c ,. • - 55 17 - Toe ofdaim 16. wherein the reference sequence 

crista f ' M gIOUp ° f se, ' ueDces » 110111 a reverse transcriptase gene of tie human immuno- 

0I - deficiency virus. 

I'*SS^r ( l'V^ ( ; ( £ EQ ID - N0:9): 71,6 ™y of dain herein said probes are immo- 

3*SffSJr^l^SF ID - N0:1< * bOized to said solid support via a linker. 
3-TATAGTXGAAACCAC(SEQ m. NO:ll): 

3'-ATAGTAXAAACCACA(SEQID.NO:12); . . , , , 
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ABSTRACT 



Oligonucleotide analogue arrays attached to solid substrates 
and methods related to the use thereof are provided. The 
oligonucleotide analogues hybridize to nucleic acids with 
either higher or lower specificity than corresponding 
unmodified oligonucleotides. Target nucleic acids which 
comprise nucleotide analogues are bound to oligonucleotide 
and oligonucleotide analogue arrays. 
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ARRAYS OF MODIFIED NUCLEIC ACID density of more than 100 members at known locations per 

PROBES AND METHODS OF USE cm 2 , or more preferably, more than 1000 members per cm 2 . 

In some embodiments, the arrays have a density of more 

CROSS-REFERENCE TO RELATED than 10,000 members per cm 2 . 

APPLICATION 5 The solid substrate upon which the array is constructed 

Tins application is a continuation-in-part of U.S. Ser. No. ■ 'includes u anv mate j ri ^ 1 u P on ™ hkh oligonucleotide analogues 

08/440,742 filed May 10, 1995 abandoned, which is a are attached in a defined relationship to one another, such as 

continuation-in-part of PCT application (designating the b ™* s > an <* dido. Specially preferred oligonucle- 

United States) SN PCT/US94/12305 filed Oct. 26, 1994, otlde analogues of the array are between about 5 and about 

which is a continuation-in-part of U.S. Ser. No. 08/284,064 10 ™ nucleotides, nucleotide analogues or a mixture thereof in 

filed Aug. 2, 1994 abandoned, which is a continuation-in- length. 

part of U.S. Ser, No 08/143,312 filed Oct. 26, 1993 In one group of embodiments, nucleoside analogues 

abandoned, each of which is incorporated herein by refer- incorporated into the oligonucleotide analogues of the array 

ence in its entirety for all purposes. will have the chemical formula: 

FIELD OF THE INVENTION 

The present invention provides probes comprised of 
nucleotide analogues immobilized in arrays on solid sub- 
strates for analyzing molecular interactions of biological 20 
interest, and target nucleic acids comprised of nucleotide 
analogues. The invention therefore relates to the molecular 

interaction of polymers immobilized on solid substrates wherein R 1 and R 2 are independently selected from the 
including related chemistry, biology, and medical diagnostic group consisting of hydrogen, methyl, hydroxy, alkoxy (e.g., 
uses. 25 methoxy, ethoxy, propoxy, allyloxy, and propargyloxy), 

alkylthio, halogen (Fluorine, Chlorine, and Bromine), 
BACKGROUND OF THE INVENTION cyan0) anc j and wherein Y is a heterocyclic moiety, 

The development of very large scale immobilized poly- c *» a bas ! sclecled ^ om J . the * rou ? of P urines > 

mer synthesis (VLSIPS™) technology provides pioneering V™™* analogues, pynnudines, pynmidine analogues, urn- 
methods for arranging large numbers of oligonucleotide 30 versal bases (e.g. S-nitroindoIe) or other groups or ring 
probes in very small arrays. See, U.S. application Ser. No. ^ ms ca P able ,. of fonmn S one °/ more ^ogcn bonds 
07/805,727 now U.S. Pat. No. 5,424,186 and PCT patent wth corresponding moieties on alternate strands within a 
publication Nos. WO 90/15070 and 92/10092, each of which double - or triple-stranded nucleic acid or nucleic acid 
is incorporated herein by reference for all purposes. U.S. analogue, or other -groups or ring systems capable of forming 
patent application Ser. No. 08/082,937, filed Jun. 25, 1993, 35 nearest-neighbor base-stacking interactions within a double- 
and incorporated herein for all purposes, describes methods or tnple-stranded complex. In other embodiments, the oh- 
for making arrays of oligonucleotide probes that are used, gonuclcotidc analogues are not constructed from 
e.g., to determine the complete sequence of a target nucleic nucleosides, but are capable of binding to nucleic acids m 
acid and/or to detect the presence of a nucleic acid with a 501111100 due t0 structural similarities between the ohgo- 
specified sequence 40 nucleotide analogue and a naturally occurring nucleic acid. 

, „ ™™™ . » . , • n An example of such an oligonucleotide analogue is a peptide 

VLSIPS™ technology provides an efficient means for ^ ! ide nucleic acid m which bases which 

large scale production of mimatunzed oligonucleotide h d [ ^ iK , 0 a , ide 

arrays for sequencing by hybridization (SBH), diagnostic backbone 

testing for inherited or somatically acquired genetic 4J ^ ^ ^ ides ^ nuclejc ^ 

diseases, and forensic analysis. Other applications include h bridized t0 O i igonucleolide arrays. In the target nucleic 
determination ot_sequence specialty ot nuc.e.c acids, acidsofthemvention , nucleotide 

analogues are incorporated 

protem-nucle.c acid complexes and other polymer-polymer ^ ^ (arget add> al(efing ^ hybridizatioD prop . 

m erac ons. erties of the target nucleic acid to an array of oligonucleotide 

SUMMARY OF THE INVENTION 50 P robes ; Typically, the oligonucleotide probe arrays also 

comprise nucleotide analogues. 

The present invention provides arrays of oligonucleotide The target nucleic acids are typically synthesized by 
analogues attached to solid substrates. Oligonucleotide ana- providing a nucleotide analogue as a reagent during the 
logues have different hybridization properties than oligo- enzymatic copying of a nucleic acid. For instance, nucle- 
nucleotides based upon naturally occurring nucleotides. By 55 otide analogues are incorporated into polynucleic acid ana- 
incorporating oligonucleotide analogues into the arrays of logues using taq polymerase in a PCR reaction. Thus, a 
the invention, hybridization to a target nucleic acid is nucleic acid containing a sequence to be analyzed is typi- 
optimized. cally amplified in a PCR or RNA amplification procedure 

The oligonucleotide analogue arrays have virtually any with nucleotide analogues, and the resulting target nucleic 
number of different members, determined largely by the 60 acid analogue amplicon is hybridized to a nucleic acid 
number or variety of compounds to be screened against the analogue array. 

array in a given application. In one group of embodiments, Oligonucleotide analogue arrays and target nucleic acids 

the array has from 10 up to 100 oligonucleotide analogue are optionally composed of oligonucleotide analogues 
members. In other groups of embodiments, the arrays have which are resistant to hydrolysis or degradation by nuclease 
between 100 and 10,000 members, and in yet other embodi- 65 enzymes such as RNAase A. This has the advantage of 
ments the arrays have between 10,000 and 1,000,0000 providing the array or target nucleic acid with greater 
members. In preferred embodiments, the array will have a longevity by rendering it resistant to enzymatic degradation. 
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For example, analogues comprising 2'-0- optionally derived from natural sources, but is often syn- 

methyloligoribonucleotides are resistant to RNAase A. thesized chemically. It is of any size. An "oligonucleotide 

Oligonucleotide analogue arrays are optionally arranged analogue" refers to a polymer with two or more monomelic 
into libraries for screening compounds for desired subunits, wherein the subunits have some structural features 
characteristics, such as the ability to bind a specified oligo- 5 in common with a naturally occurring oligonucleotide which 
nucleotide analogue, or oligonucleotide analogue- allow it to hybridize with a naturally occurring oligonucle- 
. containing structure. The libraries also include oligonucle- otide in solution. For instance, structural groups are option- 
otide analogue members which form conformationally- ally added to the ribose or base of a nucleoside for incor- 
restricted probes, such as unimolecular double-stranded poration into an oligonucleotide, such as a methyl or allyl 
probes or unimolecular double-stranded probes which group at the 2-0 position on the ribose, or a fiuoro group 
present a third chemical structure of interest. For instance, which substitutes for the 2-0 group, or a bromo group on 
the array of oligonucleotide analogues optionally include a me ribonucleoside base. The phosphodiester linkage, or 
plurality of different members, each member having the "sugar-phosphate backbone" of the oligonucleotide ana- 
formula: Y — L 1 — X 1 — L 2 — X 2 , wherein Y is a solid logue ^ su b s tituted or modified, for instance with methyl 
substrate, X 1 and X are complementary oligonucleotides phosphonates or O-methyl phosphates. Another example of 
containing at least one nucleotide analogue, L is a spacer, an oligonucleotide analogue for purposes of this disclosure 
and L is a linking group having sufficient length such that "peptide nucleic acids" in which native or modified 
X 1 and X 2 form a double-stranded oligonucleotide. An array nudeic add bases are aUached {Q g poIyamide backbo ne. 
of such members comprise a library of unimolecular double- oligonucleotide analogues optionally comprise a mixture of 
stranded oligonucleotide analogues. In another embodiment, natura i ly occurring nucleotides and nucleotide analogues, 
the members of the array of oligonucleoude are arranged to However, an oligonucleotide which is made entirely of 
present a moiety of interest within the oligonucleotide natU rally occurring nucleotides (i.e., those comprising DNA 
analogue probes of the array. For instance, the arrays are orRNA ), with the exception of a protecting group on the end 
optionally conformationally restricted having the formula of ^ oligonucleotide, such as a protecting group used 
_X"_Z^-X 12 , wherein X 11 and X 12 are complementary dm{ng nudeic ackJ syaXhGsis fc not considered an 
oligonucleotides or oligonucleotide analogues and Z is a olig0 nucleotide analogue for purposes of this invention, 
chemical structure comprising the binding site of interest. A « nucleoside » fa a pentose glycoside in which the 

Oligonucleotide analogue arrays are synthesized on a fa a hc{ c]ic base; upon the addition of a 

solid substrate by a variety of methods, including light- £ tfae ^ d becQmes a nucleotide> ^ 

directed chemical coupling, and selectively flowing syn- ^ ajo ^ biological nucleosides are p -glycoside derivatives of 

thetic reagents over portions of the solid substrate The solid J or D-2-deoxyribose. Nucleotides are phosphate 

substrate is prepared for synthesis or attachment of oligo- ^ Qf nucleosides J hich are ^ addic m 

nucleotides by treatment with suitable reagents. For ^ to ^ ps on ^ hosphate m nudeo . 

example, glass is prepared by treatment with silane reagents. ^ of and RNA are connected t ther via hos . 

The present invention provides methods for determining ^ iQ ^ y ^ of one ^ and (he 

whether a molecule of interest binds members of the oligo- * itioQ of the next ose Nucleot ide analogues and/or 

nucleotide analogue array . Foi -instance, in one embodiment, ni £ leoside analogll es are molecules with structural similari- 

a target molecule is hybridized to the array and the resulting £ ^ nucleotides or nucleosides as 

hybridization pattern is determined, he target moiecu e discussed above 

in the context of oligonucleotide analogues, 

includes genomic DNA, cDNA, unsphced RNA, mRNA, ... . & . „„♦ 
and rRNA, nucleic acid analogues, proteins and chemical « ,. A ""ucle.c aad reagen utilized in standard automated 

polymers. Tne target molecules are optionally amplified ohgonucleotide synthes* typically canes a protected phos- 

Erior to being hybridized to the array, e.g. by PCR, LCR, or P hate ° n ,he 3 ° f th , e ^ ^ °ucle,c acid 

clon'n methods reagents are referred to as nucleotides, nucleotide reagents, 

The oligonucleotide analogue members of the array used ^agents, nucleoside phosphates, nucleoside-3'- 

in the above methods are synthesized by any described « P hos P h * t ?*' "»idc phosphoramid.tes, 

method for creating arrays. In one embodiment, the oligo- phosphoramid.tes nucleoside phosphonates phosphonates 

nucleotide analogue members are attached to the solid and >*»>• It » generally understood that nucleotide 

substrate, or synfhesized on the solid substrate by light- "«BPnt» carry a reactive, or activatible, phospboryl or 

directed very large scale immobilized polymer synthesis, Ph°sphonyl moiety in order to form a phosphodiester link- 

e.g., using photo-removable protecting groups during syn- 50 age- . ... 

thesis. In another embodiment, the oligonucleotide members A "protecting group as used herein, refers to any of the 

are attached to the solid substrate by forming a plurality of groups which are designed to block one reactive site in a 

channels adjacent to the surface of said substrate, placing molecule while a chemical reaction is earned out at another 

selected monomers in said channels to synthesize oligo- reactlve slte - More particularly, the protecting groups used 
nucleotide analogues at predetermined portions of selected « herein are optionally any of those groups described in 

regions, wherein the portion of the selected regions com- Greene et d., Phtf«tf « Groups In Organic Ctenusfiy, 2nd 

prise oligonucleotide analogues different from oligonucle- Ed., John W.ley & Sons, New York, NY, 1991 which is 

otide analogues in at least one other of the selected regions, incorporated herein by reference. The proper selection of 

and repeating the steps with the channels formed along a protecting groups for a particular synthesis is governed by 
second portion of the selected regions. The sohd substrate is « we overaU meth ods employed in the synthesis. For example, 

any suitable material as described above, including beads, « "light-directed" synthesis, d.scussed herein, the protect- 

slides, and arrays, each of which is constructed from, e.g., «g g i°ups are photolabile protecting groups such as NVOC, 

silica, polymers and glass. MeNPoc, and those disclosed in co-pending Application 

' v J B PCT/US93/10162 (filed Oct. 22, 1993), incorporated herein 
DEFINITIONS 65 by reference. In other methods, protecting groups are 

An "Oligonucleotide" is a nucleic acid sequence com- removed by chemical methods and include groups such as 

posed of two or more nucleotides. An oligonucleotide is FMOC, DMT and others known to those of skill in the art. 
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A "purine" is a generic term based upon the specific 
compound "purine" having a skeletal structure derived from 
the fusion of a pyrimidine ring and an imidazole ring. It is 
generally, and herein, used to describe a generic class of 
compounds which have an atom or a group of atoms added 5 
to the parent purine compound, such as the bases found in 
the naturally occurring nucleic acids adenine 
(6-aminopurine) and guanine (2-amino-6-oxopurine), or less 
commonly occurring molecules such as 2- amino-adenine, 
N 6 -methyladenine, or 2-methylguanine. 10 

A "purine analogue" has a heterocyclic ring with struc- 
tural similarities to a purine, in which an atom or group of 
atoms is substituted for an atom in the purine ring. For 
instance, in one embodiment, one or more N atoms of the 
purine heterocyclic ring are replaced by C atoms. 15 

A "pyrimidine" is a compound with a specific heterocy- 
clic diazine ring structure, but is used generically by persons 
of skill and herein to refer to any compound having a 
1 ,3-diazine ring with minor additions, such as the common 
nucleic acid bases cytosine, thymine, uracil, 
5-methylcytosine and 5-hydroxymethylcytosine, or the non- 
naturally occurring 5-bromo-uracil. 

A "pyrimidine analogue" is a compound with structural 
similarity to a pyrimidine, in which one or more atom in the 25 
pyrimidine ring is substituted. For instance, in one 
embodiment, one or more of the N atoms of the ring are 
substituted with C atoms. 

A "solid substrate" has fixed organizational support 
matrix, such as silica, polymeric materials, or glass. In some 30 
embodiments, at least one surface of the substrate is partially 
planar. In other embodiments it is desirable to physically 
separate regions of the substrate to delineate synthetic 
regions, for example with trenches, grooves, wells or the 
like. Example of solid substrates include slides, beads and 35 
arrays. 

DESCRIPTION OF THE DRAWINGS 

FIG. 1 shows four panels (FIG. 1A, FIG. IB, FIG. 1C and 
FIG. ID). FIGS. 1A and IB graphically display the differ- 40 
ence in fluorescence intensity between the matched and 
mismatched DNA probes. FIGS. 1C and ID illustrate the 
difference in fluorescence intensity verses location on an 
example chip for DNA and RNA targets, respectively. 

FIG. 2 is a graphic illustration of specific light-directed 45 
chemical coupling of oligonucleotide analogue monomers to 
an array. 

FIG. 3 shows the relative efficiency and specificity of 
hybridization for immobilized probe arrays containing 
adenine versus probe arrays containing 2,6-diaminopurine 
nucleotides. p'-CATCGTAGAA-S' (SEQ ID NO:l)). 

FIG. 4 shows the effect of substituting adenine with 
2,6-diaminopurine (D) in immobilized poly-dA probe 
arrays. (AAAAANAAAAA (SEQ ID NO:2)). 55 

FIG. 5 shows the effects of substituting 5-propynyl-2'- 
deoxyuridine and 2-amino-2 f deoxyadenosine in AT arrays 
on hybridization to a target nucleic acid. (ATATAArATA 
(SEQ ID NO:3) and CGCGCCGCGC (SEQ ID NO:4)). 

FIG. 6 shows the effects of dl and 7-deaza-dG substitu- 60 
tions in oligonucleotide arrays. (3'-ATGTT(GlG2G3G4G5) 
CGGGT-5 1 (SEQ ID NO:5)). 

DETAILED DESCRIPTION 

Methods of synthesizing desired single stranded oligo- 65 
nucleotide and oligonucleotide analogue sequences are 
known to those of skill in the art. In particular, methods of 
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synthesizing oligonucleotides and oligonucleotide ana- 
logues are found in, for example, Oligonucleotide Synthesis: 
A Practical Approach, Gait, ed., IRL Press, Oxford (1984); 
W. H. A. Kuijpers Nucleic Acids Research 18(17), 5197 
(1994); K. L. Dueholm J. Org. Chem. 59, 5767-5773 
(1994), and S. Agrawal (ed.) Methods in Molecular Biology , 
volume 20, each of which is incorporated herein by refer- 
ence in its entirety for all purposes. Synthesizing unimo- 
lecular double-stranded DNA in solution has also been 
described. See, copending application Ser. No. 08/327,687, 
now U.S. Pat. No. 5,556,752 which is incorporated herein 
for all purposes. 

Improved methods of forming large arrays of 
oligonucleotides, peptides and other polymer sequences 
with a minimal number of synthetic steps are known. See, 
Pirrung et al., U.S. Pat. No. 5,143,854 (see also, PCT 
Application No. WO 90/15070) and Fodor et al., PCT 
Publication No. WO 92/10092, which are incorporated 
herein by reference, which disclose methods of forming vast 
arrays of peptides, oligonucleotides and other molecules 
using, for example, light-directed synthesis techniques. See 
also, Fodor et al., (1991) Science, 251, 767-77 which is 
incorporated herein by reference for all purposes. These 
procedures for synthesis of polymer arrays are now referred 
to as VLSI PS™ procedures. 

Using the VLSIP™ approach, one heterogenous array of 
polymers is converted, through simultaneous coupling at a 
number of reaction sites, into a different heterogenous array. 
See, U.S. application Ser. No. 07/796,243 now U.S. Pat. No. 
5,384,261 and U.S. application Ser. No. 07/980,523 now 
U.S. Pat. No. 5,677,195, the disclosures of which are incor- 
porated herein for all purposes. 

The development of VLSIPS™ technology as described 
in the above-noted U.S. Pat. No. 5,143,854 and PCT patent 
publication Nos. WO 90/15070 and 92/10092 is considered 
pioneering technology in the fields of combinatorial synthe- 
sis and screening of combinatorial libraries. More recently, 
patent application Ser. No. 08/082,937, filed Jun. 25, 1993 
(incorporated herein by reference), describes methods for 
making arrays of oligonucleotide probes that are used to 
check or determine a partial or complete sequence of a target 
nucleic acid and to detect the presence of a nucleic acid 
containing a specific oligonucleotide sequence. 
Combinatorial Synthesis of Oligonucleotide Arrays 
VLSIPS™ technology provides for the combinatorial 
synthesis of oligonucleotide arrays. The combinatorial 
VLSIPS™ strategy allows for the synthesis of arrays con- 
taining a large number of related probes using a minimal 
number of synthetic steps. For instance, it is possible to 
synthesize and attach all possible DNA 8mer oligonucle- 
otides (4 8 , or 65,536 possible combinations) using only 32 
chemical synthetic steps. In general, VLSIPS™ procedures 
provide a method of producing 4" different oligonucleotide 
probes on an array using only 4n synthetic steps. 

In brief, the light-directed combinatorial synthesis of 
oligonucleotide arrays on a glass surface proceeds using 
automated phosphoramidite chemistry and chip masking 
techniques. In one specific implementation, a glass surface 
is derivatized with a silane reagent containing a functional 
group, e.g., a hydroxyl or amine group blocked by a pho- 
tolabile protecting group. Photolysis through a photolithog- 
aphic mask is used selectively to expose functional groups 
which are then ready to react with incoming 
5 f -photoprotected nucleoside phosphoramidites. See, FIG. 2. 
The phosphoramidites react only with those sites which are 
illuminated (and thus exposed by removal of the photolabile 
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blocking group). Thus, the phosphoramidites only add to 
those areas selectively exposed from the preceding step. 
These steps arc repeated until the desired array of sequences 
have been synthesized on the solid surface. Combinatorial 
synthesis of different oligonucleotide analogues at different 
locations on the array is determined by the pattern of 
illumination during synthesis and the order of addition of 
coupling reagents. 

In the event that an oligonucleotide analogue with a 
polyamide backbone is used in the VLSIPS™ procedure, it 
is generally inappropriate to use phosphoramidite chemistry 
to perform the synthetic steps, since the monomers do not 
attach to one another via a phosphate linkage. Instead, 
peptide synthetic method are substituted. See, e.g., Pirrung 
et al. U.S. Pat. No. 5,143,854. 

Peptide nucleic acids are commercially available from, 
e.g., Biosearch, Inc. (Bedford, Mass.) which comprise a 
polyamide backbone and the bases found in naturally occur- 
ring nucleosides. Peptide nucleic acids are capable of bind- 
ing to nucleic acids with high specificity, and are considered 
"oligonucleotide analogues" for purposes of this disclosure. 
Note that peptide nucleic acids optionally comprise bases 
other than those which are naturally occurring. 

Hybridization of Nucleotide Analogues 

The stability of duplexes formed between RNAs or DNAs 
are generally in the order of 
RNA:RNA>RNA:DNA>DNA:DNA, in solution. Long 
probes have better duplex stability with a target, but poorer 
mismatch discrimination than shorter probes (mismatch 
discrimination refers to the measured hybridization signal 
ratio between a perfect match probe and a single base 
mismatch probe. Shorter probes (e.g., 8-mers) discriminate 
mismatches very well, but the overall duplex stability is low. 
In order to optimize mismatch discrimination and duplex 
stability, the present invention provides a variety of nucle- 
otide analogues incorporated into polymers and attached in 
an array to a solid substrate. 

Altering the thermal stability (T m ) of the duplex formed 
between the target and the probe using, e.g., known oligo- 
nucleotide analogues allows for optimization of duplex 
stability and mismatch discrimination. One useful aspect of 
altering the T m arises from the fact that Adenine-Thymine 
(A-T) duplexes have a lower T m than Guanine-Cytosine 
(G-Q duplexes, due in part to the fact that the A-T duplexes 
have 2 hydrogen bonds per base -pair, while the G-C 
duplexes have 3 hydrogen bonds per base pair. In hetero- 
geneous oligonucleotide arrays in which there is a non- 
uniform distribution of bases, it can be difficult to optimize 
hybridization conditions for all probes simultaneously. Thus, 
in some embodiments, it is desirable to destabilize G-C-rich 
duplexes and/or to increase the stability of A-T-rich duplexes 
while maintaining the sequence specificity of hybridization. 
This is accomplished, e.g., by replacing one or more of the 
native nucleotides in the probe (or the target) with certain 
modified, non-standard nucleotides. Substitution of guanine 
residues with 7-deazaguanine, for example, will generally 
destabilize duplexes, whereas substituting adenine residues 
with 2,6-diaminopurine will enhance duplex stability. A 
variety of other modified bases are also incorporated into 
nucleic acids to enhance or decrease overall duplex stability 
while maintaining specificity of hybridization. The incorpo- 
ration of 6-aza-pyrimidine analogs into oligonucleotide 
probes generally decreases their binding affinity for comple- 
mentary nucleic acids. Many 5-substituted pyrimidines sub- 
stantially increase the stability of hybrids in which they have 
been substituted in place of the native pyrimidines in the 
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sequence. Examples include 5-bromo-, 5-methyl-, 
5-propynyl-, 5-(imidazol-2-yl)-and 5-(thiazol-2-yl)- 
derivatives of cytosine and uracil. 

Many modified nucleosides, nucleotides and various 
5 bases suitable for incorporation into nucleosides are com- 
. mercially available from a variety of manufacturers, includ- 
ing the SIGMA chemical company (Saint Louis, Mo.), R&D 
systems (Minneapolis, Minn.), Pharmacia LKB Biotechnol- 
ogy (Piscataway, N.J.), CLONTECH Laboratories, Inc. 
10 (Palo Alto, Calif.), Chem Genes Corp., Aldrich Chemical 
Company (Milwaukee, Wis.), Glen Research, Inc., GEBCO 
BRL Life Technologies, Inc. (Gaithersberg, Md.), Fluka 
Chemica-Biochemika Analytika (Fluka Chemie AG, Buchs, 
Switzerland), Invitrogen, San Diego, Calif., and Applied 
15 Biosystems (Foster City, Calif.), as well as many other 
commercial sources known to one of skill. Methods of 
attaching bases to sugar moieties to form nucleosides are 
known. See, e.g., Lukevics and Zablocka (1991), Nucleoside 
Synthesis: Organosilicon Methods Ellis Horwood Limited 
20 Chichester, West Sussex, England and the references therein. 
Methods of phosphorylating nucleosides to form 
nucleotides, and of incorporating nucleotides into oligo- 
nucleotides are also known. See, e.g., Agrawal (ed) (1993) 
Protocols for Oligonucleotides and Analogues, Synthesis 
25 and Properties, Methods in Molecular Biology volume 20, 
Humana Press, Towota, N J., and the references therein. See 
also, Crooke and Lebleu, and Sanghvi and Cook, and the 
references cited therein, both supra. 
Groups are also linked to various positions on the nucleo- 
30 side sugar ring or on the purine or pyrimidine rings which 
may stabilize the duplex by electrostatic interactions with 
the negatively charged phosphate backbone, or through 
hydrogen bonding interactions in the major and minor 
groves. For example, adenosine and guanosine nucleotides 
35 are optionally substituted at the N 2 position with an imida- 
zolyl propyl group, increasing duplex stability. Universal 
base analogues such as 3-nitropyrrole and 5-nitroindole are 
optionally included in oligonucleotide probes to improve 
duplex stability through base stacking interactions. 
40 Selecting the length of oligonucleotide probes is also an 
important consideration when optimizing hybridization 
specificity. In general, shorter probe sequences are more 
specific than longer ones, in that the occurrence of a single- 
base mismatch has a greater destabilizing effect on the 
45 hybrid duplex. However, as the overall thermodynamic 
stability of hybrids decreases with length, in some embodi- 
ments it is desirable to enhance duplex stability for short 
probes globally. Certain modifications of the sugar moiety in 
oligonucleotides provide useful stabilization, and these can 
be used to increase the affinity of probes for complementary 
nucleic acid sequences; For example, 2-O-methyl-, 2-0- 
propyl-, and 2 , -0-allyl-oligoribonucleotides have higher 
binding affinities for complementary RNA sequences than 
their unmodified counterparts. Probes comprised of 
2'-fluoro-2'-deoxyolIgoribonucleotides also form more 
stable hybrids with RNA than do their unmodified counter- 
parts. 

Replacement or substitution of the internucleotide phos- 
phodiester linkage in oligo- or poly-nucleotides is also used 
to either increase or decrease the affinity of probe-target 
interactions. For example, substituting phosphodiester link- 
ages with phosphorothioate or phosphorodithioate linkages 
generally lowers duplex stability, without affecting sequence 
specificity. Substitutions with a non-ionic methylphospho- 
nate linkage (racemic, or preferably, Rp stereochemistry) 
have a stabilizing influence on hybrid formation. Neutral or 
cationic phosphoramidate linkages also result in enhanced 
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duplex stabilization. The phosphate diester backbone has sugar-phosphate backbone has been replaced with a polya- 

been replaced with a variety of other stabilizing, non-natural mide structure. 

linkages which have been studied as potential antisense Thermal equilibrium studies, kinetic "on-rate" studies, 

therapeutic agents. See, e.g., Crooke and Lebleu (eds) and sequence specificity analysis is optionally performed for 

(1993) Antisense Research Applications CRC Press; and, 5 any taiget oligonucleotide and probe or probe analogue. The 

Sanghvi and Cook (eds) (1994) Carbohydrate modifications data . obtained shows the behavior of the analogues upon 

in Antisense Research ACS Symp. Ser. #580 ACS, Wash- d - formatioQ with target oligonucleotides. Altered 

ington DC. Very stable hybrids are formed between nucleic ^ &{ usi oli leotide analogue 

acids and probes comprised of peptide nucleic acids, in F besare ascertained by following, e.g., fluorescence signal 

which the entire sugar-phosphate backbone has been w oU leo £ de analo ^ e ^rays hybridized with 

replaced with a polyarmde structure. < a target oligonucleotide over time. The data allow optimi- 

Another important factor which sometimes affects the use zatk)n of cific hy5ridi2ation conditions at, e.g., room 

of oligonucleotide probe arrays is the nature of the target lemperature (for simplified diagnostic applications), 

nucleic acid. Oligodeoxynucleotide probes can hybridize to . . - . u . , . „,„ K *r*„ v u„ 

1- vxt a j n xt a . * -*u j-*c * « «,i™„*fi^*., Another way of verifying altered duplex stability is by 
DNA and RNA targets with different affinity and specificity. 15 J . . ' .f * i 
„ . ° . • • i « » „f following the signal intensity generated upon hybridization 
For example, probe sequences containing long "runs of & n * / & . £ KTA , fo * 

• j j • • j p ^..ui., with time. Previous experiments using DNA targets and 

consecutive deoxvadenosine residues lorm less stable ... . \. , . , . A ° .„ . - 4 , 

, rv, .X * " nxr a *u %u DNA chips have shown that signal intensity increases with 

hybrids with complementary RNA sequences than with the . j . t . t ' ° , . J t . 

J . rvvT a c u *% *■ fjA .u. time, and that the more stable duplexes generate higher 

complementary DNA sequences. Substitution of dA in the . ' e , , t 6 ™ . , 

u -*u -*u • ■ j *u^-^ signal intensities faster than less stable duplexes. The signals 

probe with either 2,6-diammopurme deoxynboside, or « n & . . ♦ ■ . * 

„ JA , u u -j- **utjkta reach a plateau or "saturate after a certain amount of time 

2'-alkoxy- or 2'-fluoro-dA enhances hybridization with RNA *, . . . . . . . , 

' ' due to all of the binding sites becoming occupied. These data 

f ' ... , . , allow for optimization of hybridization, and determination 

Internal structure within nucleic acid probes or the targets rf conditions at a specified temperature, 

also influences hybridization efficiency. For example, ^ _ r . 

. . ™„ MM / „„j ™„o„~>o ^ n t„n,'nn «„,„/ «f Graphs of signal mtensity and base mismatch positions 
GC-ncn sequences, and sequences containing runs oi i$ , , , . * _r + i 
consecutiveG residues frequently self-associate to form are Pitted and the ratios of perfect match versus mis- 
higher-order structures, and [this can inhibit their binding to match fi es calculated. This calculation shows the sequence 
complementary sequences. See, Zimmermann et al. (1975) ° f n t uclcotlde ? ^ 
J. Mol Biol 92: 181; Kim (1991) Nature 351: 331; Sen and fect match/mismatch ratios greater than 4 are often desirable 
Gilbert (1988) Nature 335: 364; and Sunquist and Klug 30 10 a ° Qh ^ clG f^ diagnosOc ^ assay because, for a diploid 
(1989) Nature 342:- 825. Hiese structures are selectively genome, ratios of 2 have to be distin^ished (e.g., m the case 
destabilized by the substitution of one or more guanine of a heterozygous trait or sequence), 
residues with one or more of the following purines or purine Target Nucleic Acids Which Comprise Nucleotide Ana- 
analogs: 7-deazaguanine, 8-aza-7-deazaguanine, logues 

2- aminopurine, IH-purine, and hypoxanthine, in order to 35 Modified nucleotides and nucleotide analogues are incor- 
enhance hybridization. porated synthetically or enzymatically into DNA or RNA 

Modified nucleic acids and nucleic acid analogs can also target nucleic acids for hybridization analysis to oligonucle- 

be used to improve the chemical stability of probe arrays. otide arrays. The incorporation of nucleotide analogues in 

For example, certain processes and conditions that are useful the target optimizes the hybridization of the target in terms 

for either the fabrication or subsequent use of the arrays, 40 of sequence specificity and/or the overall affinity of binding 

may not be compatible with standard oligonucleotide to oligonucleotide and oligonucleotide analogue probe 

chemistry, and alternate chemistry can be employed to arrays. The use of nucleotide analogues in either the oligo- 

overcome these problems. For example, exposure to acidic nucleotide array or the target nucleic acid, or both, improves 

conditions will cause depurination of purine nucleotides, optimizability of hybridization interactions. Examples of 

ultimately resulting in chain cleavage and overall degrada- 45 useful nucleotide analogues which are substituted for natu- 

tion of the probe array. In this case, adenine and guanine are rally occurring nucleotides include 7-deazaguanosine, 2,6- 

replaced with 7-deazaadenine and 7-deazaguanine, diaminopurine nucleotides, 5-propynyl and other 

respectively, in order to stabilize the oligonucleotide probes 5-substituted pyrimidine nucleotides, 2'-fluro and 

towards acidic conditions which are used during the manu- 2'-methoxy -2'-deoxynucleotides and the like, 

facture or use of the arrays. 50 These nucleotide analogues are incorporated into nucleic 

Base, phosphate and sugar modifications are used in acids using the synthetic methods described supra, or using 

combination to make highly modified oligonucleotide ana- DNA or RNA polymerases. The nucleotide analogues are 

logues which take advantage of the properties of each of the preferably incorporated into target nucleic acids using in 

various modifications. For example, oligonucleotides which vitro amplification methods such as PCR, LCR, 

have higher binding affinities for complementary sequences 55 QP-replicase expansion, in vitro transcription (e.g., nick 

than their unmodified counterparts (e.g., 2'-0-methyl-,2'-0- translation or random-primer transcription) and the like, 

propyl-, and 2'-0-allyl oligonucleotides) can be incorpo- Alternatively, the nucleotide analogues are optionally incor- 

rated into oligonucleotides with modified bases porated into cloned nucleic acids by culturing a cell which 

(deazaguanine, 8-aza-7-deazaguanine, 2-aminopurine, comprises the cloned nucleic acid in media which includes 

IH-purine, hypoxanthine and the like) with non-ionic meth- 60 a nucleotide analogue. 

ylphosphonate linkages or neutral or cationic phosphorami- Similar to the use of nucleotide analogues in probe arrays, 
date linkages, resulting in additive stabilization of duplex 7-deazaguanosine is used in target nucleic acids to substitute 
formation between the oligonucleotide and a target nucleic for G/dG to enhance target hybridization by reducing sec- 
acid. For instance, one preferred oligonucleotide comprises ondary structure in sequences containing runs of poly-G/dG. 
a 2'-0-methyl-2,6-diaminopurineriboside phosphorothioate. 65 6diaminopurine nucleotides substitute for A/dA to enhance 
Similarly, any of the modified bases described herein can be target hybridization through enhanced H-bonding to T or U 
incorporated into peptide nucleic acids, in which the entire rich probes. 5-propynyl and other 5-substituted pyrimidine 
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nucleotides substitute for natural pyrimidines to enhance carboxyl. Preferred surface attaching or derivitizing portions 

target hybridization to certain purine rich probes. 2'-fluro include aminoalkylsilanes and hydroxy alky lsilanes. In par- 

and ^-methoxy-^-deoxynucleotides substitute for natural ticularly preferred embodiments, the surface attaching por- 

nucleotides to enhance target hybridization to similarly tion of the oligonucleotide analogue is either bis(2- 

substituted probe sequences. 5 hydroxyethyl)-aminopropyltriethoxysilane, n-(3- 

„ ' . r ., . 4 4 . , rt „ . - ' . .... triethoxysilylpropyl)-4-hydroxybutylamide, 

Synthesis of 5'-photoprotected 2'-0 alkyl ribonucleotide aminoprop ^ ltrie L X ysilane or hydroxypropyltriethoxysi- 

analogues lane 

The light-directed synthesis of complex arrays of nucle- Tne oligoribonucleotides generated by synthesis using 

otide analogues on a glass surface is achieved by derivatiz- ordinary ribonucleotides are usually base labile due to the 

ing cyanoethyl phosphorarnidite nucleotides and nucleotide presence of the 2*-hydroxyl group. 2'-0- 

analogues (e.g., nucleoside analogues of uridine, thymidine, methyloligoribonucleotides (2'-OMeORNs), analogues of 

cytidine, adenosine and guanosine, with phosphates) with, RNA where the 2' -hydroxyl group is methylated, are DNAse 

for example, the photolabile MeNPoc group in the and RNAse resistant, making them less base labile. Sproat, 

5'-hydroxyl position instead of the usual dimethoxytrityl B. S., and Lamond, A. I. in Oligonucleotides and Analogues: 

group. See, application SN PCT/US94/12305. 15 A Practical Approach, edited by F. Eckstein, New York: IRL 

Specific base-protected 2'-0 alkyl nucleosides are com- Press at 0xford University Press, 1991, pp. 49^86, incor- 

mercially available, from, e.g., Chem Genes Corp. (MA). P orated herein b y reference for all purposes, have reported 

The photolabile MeNPoc group is added to the S'-hydroxyl the synthesis of mixed sequences of ^-O-Methoxy- 

position followed by phosphitylation to yield cyanoethyl on oligoribonucleotides (2*-0-MeORNs) using dimethoxytrityl 

phosphorarnidite monomers. Commercially available 20 phosphorarnidite chemistry. These 2'-0-MeORNs display 

nucleosides are optionally modified (e.g., by 2-O-alkylation) g reater bmdin g affimt y for complementary nucleic acids 

to create nucleoside analogues which are used to generate than their unmodified counterparts, 

oligonucleotide analogues. Other embodiments of the invention provide mechanical 

Modifications to the above procedures are used in some „ means t0 S enerate oligonucleotide analogues. These tech- 
embodiments to avoid significant addition of MenPoc to the ™W™ ar( ; d ^* d m ™'^ m Ser No A 
3'-hydroxyl position. For instance, in one embodiment, a 07/796,243, filed Nov. 22, 1991, which is incorporated 
2'-0-methyl ribonucleotide analogue is reacted with DMT- herein by reference in its entirety for all purposes 
CI {di(p-methoxyphenyl)phenylchloride} in the presence of Essentially, oligonucleotide analogue reagents are directed 
pyridine to generate a 2'-0-methyl-5*-0-DMT ribonucle- 30 °ver the surface - of a substrate such that a predefined array 
otide analogue. This allows for the addition of TBDMS to of ohgonucleotide analogues is created. For instance, a 
the 3'-0 of the ribonucleoside analogue by reaction with ««* of channel, grooves, or spots are formed on or 
TBDMS-Triflate (t-butyldimethylsilyltrifluoromethane- " «*jacent to f substl f te - R ea Sf ts are selectively flowed 
sulfonate) in the presence of triethylamine in THF or ^posited in the channels, grooves or spots, 
(tetrahydrofuran)toyielda2'-0-methyl-3'-0-TBDMS-5 , -0- 35 forming an array having different oligonucleotides and/or 
DMT ribonucleotide base analogue. This analogue is treated oligonucleotide analogues at selected locations on the sub- 
with TCAA (trichloroacetic acid) to cleave off the DMT strate * 
group, leaving a reactive hydroxyl group at the 5* position. Detection of Hybridization 

MeNPoc is then added to the oxygen of the 5' hydroxyl In one embodiment, hybridization is detected by labeling 

group using MenPoc-Cl in the presence of pyridine. The ^ a target with, e.g., fluorescein or other known visualization 

TBDMS group is then cleaved with F" (e.g., NaF) to yield agents and incubating the target with an array of oligonucle- 

a ribonucleotide base analogue with a MeNPoc group otide analogue probes. Upon duplex formation by the target 

attached to the 5' oxygen on the nucleotide analogue. If with a probe in the array (or triplex formation in embodi- 

appropriate, this analogue is phosphitylated to yield a phos- ments where the array comprises unimolecular double- 

phoramidite for oligonucleotide analogue synthesis. Other 45 stranded probes), the fluorescein label is excited by, e.g., an 

nucleosides or nucleoside analogues are protected by similar argon laser and detected by viewing the array, e.g., through 

procedures. a scanning confocal microscope. 

Synthesis of Oligonucleotide Analogue Arrays on Chips Sequencing by hybridization 

Other than the use of photoremovable protecting groups, Current sequencing methodologies are highly reliant on 

the nucleoside coupling chemistry used in VLSIPS™ tech- 50 complex procedures and require substantial manual effort, 

nology for synthesizing oligonucleotides and oligonucle- Conventional DNA sequencing technology is a laborious 

•otide analogues on chips is similar to that used for oligo- procedure requiring electrophpretic size separation of 

nucleotide synthesis. The oligonucleotide is typically linked labeled DNA fragments. An alternative approach involves a 

to the substrate via the 3'-hydroxyl group of the oligonucle- hybridization strategy carried out by attaching target DNA to 

otide and a functional group on the substrate which results 55 a surface. The target is interrogated with a set of oligonucle- 

in the formation of an ether, ester, carbamate or phosphate otide probes, one at a time (see, application SN PCT/US94/ 

ester linkage. Nucleotide or oligonucleotide analogues are 12305). 

attached to the solid support via carbon-carbon bonds using, A preferred method of oligonucleotide probe array syn- 

for example, supports having (poly)trifluorochloroethylene thesis involves the use of light to direct the synthesis of 

surfaces, or preferably, by siloxane bonds (using, for 60 oligonucleotide analogue probes in high-density, miniatur- 

example, glass or silicon oxide as the solid support). Silox- ized arrays. Matrices of spatially-defined oligonucleotide 

ane bonds with the surface of the support are formed in one analogue probe arrays were generated. The ability to use 

embodiment via reactions of surface attaching portions these arrays to identify complementary sequences was dem- 

bearing trichlorosilyl or trialkoxysilyl groups. The surface onstrated by hybridizing fluorescent labeled oligonucle- 

attaching groups have a site for attachment of the oligo- 65 otides to the matrices produced. 

nucleotide analogue portion. For example, groups which are Oligonucleotide analogue arrays are used, e.g., to study 

suitable for attachment include amines, hydroxyl, thiol, and sequence specific hybridization of nucleic acids, or protein- 
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nucleic acid interactions. Oligonucleotide analogue arrays and the like are incorporated into the oligonucleotide ana- 
are used to define the thermodynamic and kinetic rules logue probe, or attached to the distal end of the oligomer, 
governing the formation and stability of oligonucleotide and e.g., as a spacing molecule, or as a probe or probe target, 

oligonucleotide analogue complexes. Flexible linkers are optionally used to separate complemen- 

Oligonucleotide analogue Probe Arrays and Libraries 5 tarv portions of the oligonucleotide analogue. 

The use of oligonucleotide analogues in probe arrays ; The present invention also contemplates the preparation 
provides several benefits as compared to standard oligo- of libraries of oligonucleotide analogues having bulges or 
nucleotide arrays. For instance, as discussed supra, certain l°ops addition to complementary regions. Specific RNA 

oligonucleotide analogues have enhanced hybridization bulges are often recognized by proteins (e.g., TAR RNA is 

characteristics to complementary nucleic acids as compared 10 recognized by the TAT protein of HIV). Accordingly, librar- 

with oligonucleotides made of naturally occurring nucle- iesof oligonucleotide analogue bulges or loops are useful in 

otides. One primary benefit of enhanced hybridization char- a number of diagnostic applications. The bulge or loop can 

acteristics is that oligonucleotide analogue probes are be present in the oligonucleotide analogue or linker portions, 

optionally shorter than corresponding probes which do not Unimolecular analogue probes can be configured in a 

include nucleotide analogues. 15 variety of ways. In one embodiment, the unimolecular 

Standard oligonucleotide probe arrays typically require probes comprise linkers, for example, where the probe is 

fairly long probes (about 15-25 nucleotides) to achieve arranged according to the formula Y— L 1 — X 1 — L 2 — X 2 , in 

strong binding to target nucleic acids. The use of such long which Y represents a solid support, X 1 and X 2 represent a 

probes is disadvantageous for two reasons. First, the longer 2Q pair of complementary oligonucleotides or oligonucleotide 

the probe, the more synthetic steps must be performed to analogues, L 1 represents a bond or a spacer, and L 2 repre- 

make the probe and any probe array comprising the probe. serils a linking group having sufficient length such that X 1 

This increases the cost of making the probes and arrays. and X 2 form a double-stranded oligonucleotide. The general 

Furthermore, as each synthetic step results in less than 100% synthetic and conformational strategy used in generating the 

coupling for every nucleotide, the quality of the probes double-stranded unimolecular probes is similar to that 

degrades as they become longer. Secondly, short probes 25 described in co-pending application Ser. No. 08/327,687, 

provide better mis-match discrimination for hybridization to except that any of the elements of the probe (L 1 , X 1 , L 2 and 

a target nucleic acid. This is because a single base mismatch X 2 ) comprises a nucleotide or an oligonucleotide analogue, 

for a short probe-target hybridization is less destabilizing For instance, in one embodiment X 1 is an oligonucleotide 

than a single mismatch for a long probe-target hybridization. 3Q analogue. 

Thus, it is harder to distinguish a single probe-target mis- The oligonucleotide analogue probes are optionally 
match when the probe is a 20-mer than when the probe is an . arranged to present a variety of moieties. For example, 
8-mer. Accordingly, the use of short oligonucleotide ana- structural components are optionally presented from the 
logue probes reduces costs and increases mismatch discrimi- middle of a conform at ion ally restricted oligonucleotide ana- 
nation in probe arrays. 35 logue probe. In these embodiments, the analogue probes 
The enhanced hybridization characteristics of oligonucle- generally have the structure — X 1 — Z— X 2 wherein X n and 
otide analogues also allows for the creation of oligonucle- X 12 are complementary oligonucleotide analogues and Z is 
otide analogue probe arrays where the probes in the arrays a structural element presented away from the surface of the 
have substantial secondary structure. For instance, the oli- probe array. Z can include an agonist or antagonist for a cell 
go nucleotide analogue probes are optionally configured to 4 q membrane receptor, a toxin, venom, viral epitope, hormone, 
be fully or partially double stranded on the array. The probes peptide, enzyme, cofactor, drug, protein, antibody or the 
are optionally complexed with complementary nucleic like. 

acids, or are optionally unimolecular oligonucleotides with General tiling strategies for detection of a Polymorphism 

self-complementary regions. Libraries of diverse double- in a target oligonucleotide 

stranded oligonucleotide analogue probes are used, for 45 In diagnostic applications, oligonucleotide analogue 

example, in screening studies to determine binding affinity arrays ^ arrays 0Q chipS) slides or beads ) m used to 

of nucleic acid binding proteins, drugs, or oligonucleotides determine whether there are any differences between a 

(e.g., to examine triple helix formation). Specific oligonucle- reference sequence and a target oligonucleotide, e.g., 

otide analogues are known to be conducive to the formation whether an individual has a mutation or polymorphism in a 

of unusual secondary structure. See, Durland (1995) Bio- 5Q gene M discussed supra, the oligonucleotide target 

conjugate Chem. 6: 278-282. General strategies for using ^ opt i ona Uy a nuc leic acid such as a PCR amplicon which 

unimolecular double-stranded oligonucleotides as probes comprises one or more nucleotide analogues. In one 

and for library generation is described in application Ser. No embodiment, arrays are designed to contain probes exhib- 

08/327,687, and similar strategies are applicable to oligo- j tmg complementarity to one or more selected reference 

nucleotide analogue probes. 55 sequence whose sequence is known. The arrays are used to 

In general, a solid support, which optionally has an re ad a target sequence comprising either the reference 

attached spacer molecule is attached to the distal end of the sequence itself or variants of that sequence. Any polynucle- 

oligonucleotide analogue probe. The probe is attached as a 0 tide of known sequence is selected as a reference sequence, 

single unit, or synthesized on the support or spacer in a Reference sequences of interest include sequences known to 

monomer by monomer approach using the VLSIPS™ or go include mutations or polymorphisms associated with phe- 

mechanical partitioning methods described supra. Where the notypic changes having clinical significance in human 

oligonucleotide analogue arrays are fully double-stranded, patients. For example, the CFTR gene and P53 gene in 

oligonucleotides (or oligonucleotide analogues) comple- humans have been identified as the location of several 

mentary to the probes on the array are hybridized to the mutations resulting in cystic fibrosis or cancer respectively, 
array. 65 Other reference sequences of interest include those that 

In some embodiments, molecules other than serve to identify pathogenic microorganisms and/or are the 

oligonucleotides, such as proteins, dyes, co-factors, linkers site of mutations by which such microorganisms acquire 
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drug resistance (e.g., the HIV reverse transcriptase gene for 
HIV resistance). Other reference sequences of interest 
include regions where polymorphic variations are known to 
occur (e.g., the D-loop region of mitochondrial DNA). 
These reference sequences also have utility for, e.g., 
forensic, cladistic, or epidemiological studies. 

Other reference sequences of interest include those from 
the genome of pathogenic viruses (e.g., hepatitis (A, B, or 
C), herpes virus (e.g., VZV, HSV-1, HAV-6, HSV-II, CMV, 
and Epstein Barr virus), adenovirus, influenza virus, 
flaviviruses, echovirus, rhinovirus, coxsackie virus, 
cornovirus, respiratory syncytial virus, mumps virus, 
rotavirus, measles virus, rubella virus, parvovirus, vaccinia 
vims, HTLV virus, dengue virus, papillomavirus, mollus- 
cum virus, poliovirus, rabies virus, JC virus and arboviral 
encephalitis virus. Other reference sequences of interest are 
from genomes or episomes of pathogenic bacteria, particu- 
larly regions that confer drug resistance or allow phylogenic 
characterization of the host (e.g., 16S rRNAor correspond- 
ing DNA). For example, such bacteria include chlamydia, 
rickettsial bacteria, mycobacteria, staphylococci, treptocci, 
pneumonococci, meningococci and conococci, klebsiella, 
proteus, serratia, pseudomonas, legionella, diphtheria, 
salmonella, bacilli, cholera, tetanus, botulism, anthrax, 
plague, leptospirosis, and Lymes disease bacteria. Other 
reference sequences of interest include those in which 
mutations result in the following autosomal recessive dis- 
orders: sickle cell anemia, p-thalassemia, phenylketonuria, 
galactosemia, Wilson's disease, hemochromatosis, severe 
combined immunodeficiency, alpha-l-antitrypsin 
deficiency, albinism, alkaptonuria, lysosomal storage dis- 
eases and Ehlers-Danlos syndrome. Other reference 
sequences of interest include those in which mutations result 
in X-linked recessive disorders: hemophilia, glucose-6- 
phosphate dehydrogenase, agammaglobulimenia, diabetes 
insipidus, Lesch-Nyhan syndrome, muscular dystrophy, 
Wiskott-Aldrich syndrome, Fabry's disease and fragile 
X-syndrome. Other reference sequences of interest includes 
those in which mutations result in the following autosomal 
dominant disorders: familial hypercholesterolemia, polycys- 
tic kidney disease, Huntington's disease, hereditary 
spherocytosis, Marfan's syndrome, von Willebrand's 
disease, neurofibromatosis, tuberous sclerosis, hereditary 
hemorrhagic telangiectasia, familial colonic polyposis, 
Ehlers-Danlos syndrome, myotonic dystrophy, muscular 
dystrophy, osteogenesis imperfecta, acute intermittent 
porphyria, and von Hippel-Lindau disease. 

Although an array of oligonucleotide analogue probes is 
usually laid down in rows and columns for simplified data 
processing, such a physical arrangement of probes on the 
solid substrate is not . essential. Provided that the spatial 
location of each probe in an array is known, the data from 
the probes is collected and processed to yield the sequence 
of a target irrespective of the physical arrangement of the 
probes on, e.g., a chip. In processing the data, the hybrid- 
ization signals from the respective probes is assembled into 
any conceptual array desired for subsequent data reduction, 
whatever the physical arrangement of probes on the sub- 
strate. 

In one embodiment, a basic tiling strategy provides an 
array of immobilized probes for analysis of a target oligo- 
nucleotide showing a high degree of sequence similarity to 
one or more selected reference oligonucleotide (e.g., detec- 
tion of a point mutation in a target sequence). For instance, 
a first probe set comprises a plurality of probes exhibiting 
perfect complementarity with a selected reference oligo- 
nucleotide. The perfect complementarity usually exists 
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throughout the length of the probe. However, probes having 
a segment or segments of perfect complementarity that is/are 
flanked by leading or trailing sequences lacking comple- 
mentarity to the reference sequence can also be used. Within 

5 a segment of complementarity, each probe in the first probe 
set has at least one interrogation position that corresponds to 
a nucleotide in the reference sequence. The interrogation 
position is aligned with the corresponding nucleotide in the 
reference sequence when the probe and reference sequence 

10 are aligned to maximize complementarity between the two. 
If a probe has more than one interrogation position, each 
corresponds with a respective nucleotide in the reference 
sequence. The identity of an interrogation position and 
corresponding nucleotide in a particular probe in the first 

15 probe set cannot be determined simply by inspection of the 
probe in the first set. An interrogation position and corre- 
sponding nucleotide is defined by the comparative structures 
of probes in the first probe set and corresponding probes 
from additional probe sets. 

20 For each probe in the first set, there are, for purposes of 
the present illustration, multiple corresponding probes from 
additional probe sets. For instance, there are optionally 
probes corresponding to each nucleotide of interest in the 
reference sequence. Each of the corresponding probes has an 

25 interrogation position aligned with that nucleotide of inter- 
est. Usually, the probes from the additional probe sets are 
identical to the corresponding probe from the first probe set 
with one exception. The exception is that at the interrogation 
position, which occurs in the same position in each of the 

30 corresponding probes from the additional probe sets. This 
position is occupied by a different nucleotide in the corre- 
sponding probe sets. Other tiling strategies are also 
employed, depending on the information to be obtained. 
The probes are oligonucleotide analogues which are 

35 capable of hybridizing with a target nucleic sequence by 
complementary base-pairing. Complementary base pairing 
includes sequence-specific base pairing, which comprises, 
e.g., Watson-Crick base pairing or other forms of base 
pairing such as Hoogsteen base pairing. The probes are 

40 attached by any appropriate linkage to a support. 3' attach- 
ment is more usual as this orientation is compatible with the 
preferred chemistry used in solid phase synthesis of oligo- 
nucleotides and oligonucleotide analogues (with the excep- 
tion of, e.g., analogues which do not have a phosphate 

45 backbone, such as peptide nucleic acids). 

EXAMPLES 

The following examples are provided by way of illustra- 
5( j tion only and not by way of limitation. A variety of param- 
eters can be changed or modified to yield essentially similar 
results. 

One approach to enhancing oligonucleotide hybridization 
is to increase the thermal stability (T m ) of the duplex formed 
55 between the target and the probe using oligonucleotide 
analogues that are known to increase T m 's upon hybridiza- 
tion to DNA. Enhanced hybridization using oligonucleotide 
analogues is described in the examples below, including 
enhanced hybridization in oligonucleotide arrays. 

60 Example 1 

Solution oligonucleotide melting T m 

The T m of 2'-0-methyl oligonucleotide analogues was 
compared to the T m for the corresponding DNA and RNA 
65 sequences in solution. In addition, the T m of 2'-0-methyl 
oligonucleotide:DNA, 2'-0-methyl oligonucleotide:RNA 
and RNA: DNA duplexes in solution was also determined. 
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The T m was determined by varying the sample temperature 
and monitoring the absorb ance of the sample solution at 260 
nm. The oligonucleotide samples were dissolved in a 0.1M 
NaCl solution with an oligonucleotide concentration of 2 
/iM. Table 1 summarizes the results of the experiment. The 5 
results show that the hybridization of DNA in solution has 
approximately the same T m as the hybridization of DNA 
with a 2'-0-methyl-substituted oligonucleotide analogue. 
The results also show that the T m for the 2'-0-methyl- 
substituted oligonucleotide duplex is higher than that for the 10 
corresponding RNA: 2 f -0-methyl -substituted oligonucle- 
otide duplex, which is higher than the T m for the corre- 
sponding DNA:DNA or RNA: DNA duplex. 

TABLE 1 15 



Solution Oligonucleotide Melting Experiments 
(+) - Target Sequence 
(5 , <nXrAAOGGTAGCATCTTGAC-3')CSEQ ID NO: 6)- 

(-) - Complementary Sequence 
f5^TCAAGATGCTACCGTTCAG-3^fSEQ ID NO: 7)* 



Type of Oligonucleotide, Type of Oligonucleotide, 

Target Sequence (+) Complementary Sequence (+) T m (° C.) 



DNA(+) 


DNA(-) 


61.6 


DNA(+) 


2tDMe(-) 


58.6 


2'OMe(+) 


DNA(-) 


61.6 


2<OMe(+) 


2*OMe(-) 


78.0 


RNA(+) 


DNA(-) 


58.2 


RNA(+) 


2'0Mc(-) 


73.6 



*T refers to thymine for the DNA oligonucleotides, or uracil for the RNA 
oligonucleotides. 



Example 2 

Array hybridization experiments with DNA chips and 
oligonucleotide analogue targets 

A variable length DNA probe array on a chip was 
designed to discriminate single base mismatches in the 3 
corresponding sequences 
5 f -CTGAACGGTAGCATCTTGAC-3' (SEQ ID NO:6) 
(DNA target), S'-CUGAACGGUAGCAUCUUGACO' 
(SEQ ID NO:8) (RNA target) and 
S'-CUGAACGGUAGCAUCUUGACO* (SEQ ID NO:9) 
(2'-0-methyl oligonucleotide target), and generated by the 
VLSIPS™ procedure. The Chip was designed with adjacent 
12-mers and 8-mers which overlapped with the 3 target 
sequences as shown in Table 2. 
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rate of increase in intensity was then plotted for each probe 
position. The rate of increase in intensity was similar for 
both targets in the 8-mer probe arrays, but the 12-mer probes 
hybridized more rapidly to the DNA target oligonucleotide. 

Plots of intensity versus probe position were generated for 
the RNA, DNA and 2-O-methyl oligonucleotides to ascer- 
tain mismatch discrimination. The 8-mer probes displayed 
similar mismatch discrimination against all targets. The 
12-mer probes displayed the highest mismatch discrimina- 
tion for the DNA targets, followed by the 2'-0-methyl target, 
with the RNA target showing the poorest mismatch discrimi- 
nation. 

Thermal equilibrium experiments were performed by 
hybridizing each of the targets to the chip for 90 minutes at 
5° C. temperature intervals. The chip was hybridized with 
the target in 5x SSPE at a target concentration of 10 nM. 
Intensity measurements were taken at the end of the 90 
minute hybridization at each temperature point as described 
above. All of the targets displayed similar stability, with 
minimal hybridization to the 8-mer probes at 30° C In 
addition, all of the targets showed similar stability in hybrid- 
izing to the 12-mer probes. Thus, the 2'-0-methyl oligo- 
nucleotide target had similar hybridization characteristics to 
DNA and RNA targets when hybridized against DNA 
probes. 

Example 3 

2'-0-methyl-substituted oligonucleotide chips 
DMT-protected DNA and 2'-0-methyI phosphoramidites 
were used to synthesize 8-mer probe arrays on a glass slide 
using the VLSIPS™ method. The resulting chip was hybrid- 
ized to DNA and RNA targets in separate experiments. The 
target sequence, the sequences of the probes on the chip and 
the general physical layout of the chip is described in Table 
3. 

The chip was hybridized to the RNA and DNA targets in 
successive experiments. The hybridization conditions used 
were 10 nM target, in 5x SSPE. The chip and solution were 
heated from 20° C. to 50° C, with a fluorescence measure- 
ment taken at 5 degree intervals as described in SN PCT/ 
US94/12305. The chip and solution were maintained at each 
temperature for 90 minutes prior to fluorescence measure- 
ments. The results of the experiment showed that DNA 
probes were equal or superior to 2 f -0-methyl oligonucle- 
otide analogue probes for hybridization to a DNA target, but 
that the 2'-0-methyl analogue oligonucleotide probes 



TABLE 2 



Array hybridization Experiments 



Target 1 (DNA) 

8-mer probe (complement) 

12-mer probe (complement) 

Target 2 (RNA) 

8-mer probe (complement) 

12-mer probe (complement) 

Target 3 (2-O-Mc oligo) 

8-mer probe (complement) 

12-mer probe (complement) 



5'- CTGAACGGTAGCATCTTGAC-3' (SEQ ID NO: 6) 



5'-CUGAACGGUAGCAUCUUGAC-3' (SEQ ID NO: 8) 



5'-CUGAACXjGUAGCAUCUUGAC-3' (SEQ ID NO: 9) 



Target oligos were synthesized using standard techniques. showed dramatically better hybridization to the RNA target 

The DNA and 2-O-methyl oligonucleotide analogue target than the DNA probes. In addition, the 2 , -0-methyl analogue 

oligonucleotides were hybridized to the chip at a concen- oligonucleotide probes showed superior mismatch discrimi- 

tration of 10 nM in 5x SSPE at 20° C. in sequential 65 nation of the RNA target compared to the DNA probes. The 

experiments. Intensity measurements were taken at each difference in fluorescence intensity between the matched and 

probe position in the 8-mer and 12-mer arrays over time. The mismatched analogue probes was greater than the difference 
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between the matched and mismatched DNA probes, dra- 
matically increasing the signal-to-noise ratio. FIG. 1 dis- 
plays the results graphically (FIGS. 1A and IB). (M) and (P) 
indicate mismatched and perfectly matched probes, respec- 
tively. (FIGS. 1C and ID) illustrates the fluorescence in ten- 5 
sity versus location on an example chip for the various 
. probes at 20° C, using DNA and RNA targets, respectively. 

TABLE 3 



2-O-methyI Oligonucleotide Analogues on a Chip. 



10 



Target Sequence (DNA): 
Target Sequence (RNA): 
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5'-CTGAACGGTAGCArCTTGAC-3' 
(SEQ ID NO: 6) 

S'-CUGAACGGUAGCAUCUUGAC-^ 
(SEQ ID NO: 8) 

Matching DNA oligonucleotide S'-CTTGCCAT (SEQ ID NO: 10) 
probe {DNA (M)} 
Matching 2*-0-methyl 5-CUUGCCAU (SEQ ID NO: 11) 

oligonucleotide analogue probe 
{2'OMe (M)} 

DNA oligonucleotide probe with S'-CTTGCTAT (SEQ ID NO: 12) 
1 base mismatch {DNA (P)} 

2'-0-mcthyl oligonucleotide 5'-CUUGCUAU (SEQ ID NO: 13) 20 
analogue probe with 1 base 
mismatch {2'OMe (M)} 

SCHEMATIC REPRESENTATION OF 2'-0-METHYL/DNA CHIP 



Matching 2'-0- methyl oligonucleotide analogue probe 
2'-0-methyl oligonucleotide analogue probe with 1 base mismatch 
DNA oligonucleotide probe with 1 base mismatch 
Matching DNA oligonucleotide probe 
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Example 4 

Synthesis of oligonucleotide analogues 

The reagent MeNPoc-Cl group reacts non-selectively 
with both the 5 1 and 3' hydroxyls on 2-O-methyl nucleoside 
analogues. Thus, to generate high yields of 5'-0-MeNPoc- 35 
2'-0-methylribonucleoside analogues for use in oligonucle- 
otide analogue synthesis, the following protection- 
deprotection scheme was utilized. 

The protective group DMT was added to the 5'-0 position 
of the 2'-0-methylribonucleoside analogue in the presence 
of pyridine. The resulting 5'-0-DMT protected analogue 40 
was reacted with TBDMS-Triflate in THF, resulting in the 
addition of the TBDMS group to the 3'-0 of the analogue. 
The 5-DMT group was then removed with TCAA to yield 
a free OH group at the 5 f position of the 2'-0-methyl 
ribonucleoside analogue, followed by the addition of 45 
MeNPoc-Cl in the presence of pyridine, to yield 5'-0- 
MeNPoc-^-O-TBDMS-^-O-methyl ribonucleoside ana- 
logue. The TBDMS group was then removed by reaction 
with NaF, and the 3'-OH group was phosphitylated using 
standard techniques. 50 

Two other potential strategies did not result in high 
specific yields of S'-O-MeNPoc^'-O-methylribonucleoside. 
In the first, a less reactive MeNPoc derivative was synthe- 
sized by reacting MeNPoc-Cl with N-hydroxy succimide to 
yield MeNPoc-NHS. This less reactive photocleavable 55 
group (MeNPoc-NHS) was found to react exclusively with 
the 3' hydroxyl on the 2'-0-methylribonucleoside analogue. 
In the second strategy, an organotin protection scheme was 
used. Dibutyltin oxide was reacted with the 2'-0- 
methylribonucleoside analogue followed by reaction with 
MeNPoc. Both 5'-0-MeNPoc and 3'-0-MeNPoc 2*-0- 60 
methylribonucleoside analogues were obtained. 

Example 5 

Hybridization to mixed-sequence oligodeoxynucleotide 
probes substituted with 2-amino-2'-deoxyadenosine (D) 65 

To test the effect of a 2-amino-2'-deoxyadenosine (D) 
substitution in a heterogeneous probe sequence, two 4x4 



oligodeoxynucleotide arrays were constructed using 
VLSIPS™ methodology and 5'-0-MeNPOC-protected 
deoxynucleoside phosphoramidites. Each array was com- 
prised of the following set of probes based on the sequence 
(3 , )-CATCGTAGAA-(5 t ) (SEQ ID NO:l): 
1 ,-(HEG)-(3 l )-CArN 1 GTAGAA^S') (SEQ ID NO: 14) 

2. -(HEG)-(3 , )-CA^CN 2 TAGAA-(5 , ) (SEQ ID NO:15) 

3. -(HEG)-(3')-CATCGN 3 AGAA-(5') (SEQ ID NO:16) 

4. -(HEG)-(3 , )-CA^CGTN 4 GAA-(5 , ) (SEQ ID NO:17) 
where HEG=hexaethyleneglycol linker, and N is either 
A,G,C orT, so that probes are obtained which contain single 
mismatches introduced at each of four central locations in 
the sequence. The first probe array was constructed with all 
natural bases. In the second array, 2-amino-2'- 
deoxyadenosine (D) was used in place of adenosine (A). • 
Both arrays were hybridized with a 5'-fluorescein-labeled 
oligodeoxynucleotide target, (5')-Fl-d 
(CTGAACGGTAGCATCTTGAC)-^ (SEQ ID NO:18), 
which contained a sequence (in bold) complementary to the 
base probe sequence. The hybridization conditions were: 10 
nM target in 5x SSPE buffer at 22° C. with agitation. After 
30 minutes, the chip was mounted on the flowcell of a 
scanning laser confocal fluorescence microscope, rinsed 
briefly with 5x SSPE buffer at 22° C, and then a surface 
fluorescence image was obtained. 

The relative efficiency of hybridization of the target to the 
complementary and single-base mismatched probes was 
determined by comparing the average bound surface fluo- 
rescence intensity in those regions of the of the array 
containing the individual probe sequences. The results (FIG. 
3) show that a 2-amino-2'-deoxyadenosine (D) substitution 
in a heterogeneous probe sequence is a relatively neutral 
one, with little effect on either the signal intensity or the 
specificity of DNA-DNA hybridization, under conditions 
where the target is in excess and the probes are saturated. 

Example 6 

Hybridization to a dA-homopolymer oligodeoxynucle- 
otide probe substituted with 2-amino-2'-deoxyadenosine (D) 

The following experiment was performed to compare the 
hybridization of 2'-deoxyadenosine containing homopoly- 
mer arrays with 2-amino-2'-deoxyadenosine homopolymer 
arrays. The experiment was performed on two 11-mer oli- 
godeoxynucleotide probe containing arrays. Two 11-mer 
oligodeoxynucleotide probe sequences were synthesized on 
a chip using 5 -O-MeNPOC-protected nucleoside phos- 
phoramidites and standard VLSIPS™ methodology. 

The sequence of the first probe was: (HEG)-(3')-d 
(AAAAANAAAAAH5*) (SEQ ID NO:19); where HEG- 
hexaethyleneglycol linker, and N is either A,G,C or T. The 
second probe was the same, except that dA was replaced by 
2-amino-2'-deoxyadenosine (D). The chip was hybridized 
with a 5'-fluorescein-labeled oligodeoxynucleotide target, 
(5')-Fl-d(TTTTTGTTnT)-(3') (SEQ ID NO:20), which 
contained a sequence complementary to the probe sequences 
where N=C. Hybridization conditions were 10 nM target in 
5x SSPE buffer at 22° C. with agitation. After 15 minutes, 
the chip was mounted on the flowcell of a scanning laser 
confocal fluorescence microscope, rinsed briefly with 5x 
SSPE buffer at 22° C. (low stringency), and a surface 
fluorescence image was obtained. Hybridization to the chip 
was continued for another 5 hours, and a surface fluores- 
cence image was acquired again. Finally, the chip was 
washed briefly with 0.5x SSPE (high-stringency), then with 
5x SSPE, and re-scanned. 

The relative efficiency of hybridization of the target to the 
complementary and single-base mismatched probes was 
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determined by comparing the average bound surface fluo- A 16x64 oligonucleotide array was constructed using 

rescence intensity in those regions of the of the array VLSIPS™ methodology, with S'-O-MeNPOC-protected 

containing the individual probe sequences. The results (FIG. nucleoside phosphoramidites, including the analogs ddG, 

4) indicate that substituting 2*-deoxyadenosine with an( j & -ph e may was comprised of the set of probes 

2-amino-2'-deoxyadenosine in a d(A)„ homopolymer probe 5 represente d by the following sequence: -(linker)-(3>d(A T 

sequence results in a significant enhancement in specific G TT G, G 2 G 3 G 4 G 5 C G G G TH 5 *); (SEQ ID NO:28) 

hybridization to a complementary oligodeoxynucleotide whcre underlined bases are fixed, and the five internal 

sequence. deoxyguanosines (G^) are substituted with G, ddG, dl, and 

Example 7 T in all possible (1024 total) combinations. A complemen- 

Hybridization to alternating A-T oligodeoxynucleotide 10 tary oligonucleotide target, labeled with fluorescein at the 

probes substituted with 5-propynyl-2'-deoxyuridine (P) and 5'-end: (5')-Fl-d(C AATACAACCCCCGCCCA 

2-amino-2 , -deoxyadenosine (D) TC CH3 1 ) (SEQ ID NO:29), was hybridized to the array. 

Commercially available S'-DMT-protected The hybridization conditions were: 5 nM target in 6x SSPE 

2'-deoxynucleoside/nucleoside-analog phosphoramidites buffer at 22° C. with shaking. After 30 minutes, the chip was 

(Glen Research) were used to synthesize two decanucleotide is mounted on the flowcell of an Affymetrix scanning laser 

probe sequences on separate areas on a chip using a modified confocal fluorescence microscope, rinsed once with 0.25 x 

VLSIPS™ procedure. In this procedure, a glass substrate is SSPE buffer at 22° C, and then a surface fluorescence image 

initially modified with a terminal-MeNPOC-protected hexa- was acquired. 

ethyleneglycol linker. The substrate was exposed to light ^ « efficiency » 0 f target hybridization to each probe in 
through a mask to remove the protecting group from the 20 ^ fc ^onal to the bound surface fluorescence 
linker in a checkerboard pattern. The first probe sequence > ^ ^ ion of ^ cW where ^ be was 
was then synthesized I m the e xposed DMT. synthes Ld. The relative values for a subset of probes (those 
phosphoramidites with acid ; deprotecUon cycles and the J substitutions only) are 
sequence was finally capped with (MeO) 2 PNiPr 2 /tetrazole ^ u ° j: c Ct . rtf ml4n - np ' - t u 
followed by oxidation. A second checkerboard exposure in „ ^wn in FIG. 6 Substitution of guanosme with 
a different (previously unexposed) region of the chip was 25 7-deazaguanosme within the internal run of five G s results 
then performed, and the second probe sequence was syn- in a significant enhancement in the fluorescence signal 
thesized by the same procedure. The sequence of the first intensity which measures hybridization. Deoxymosine sub- 
control" probe was: -(HEGHSO-CGCGCCGCGC-^') stitutions also enhance hybridization to the probe, but to a 
(SEQ ID NO:21); and the sequence of the second probe was lesser extent. In this example, the best overall enhancement 
one of the following: 30 is realized when the dG "run" is -40-60% substituted with 
^(HEGHS^ATXrAATATAHS') (SEQ ID NO:22) 7-deaza-dG, with the substitutions distributed evenly 

2. -(HEG)-(3 , )-d(APAPAAPAPAH5') (SEQ ID NO:23) throughout the run (i.e., alternating dG/deaza-dG). 

3. -(HEG)-(3')-d(DTDTDDTDTD)-(5') (SEQ ID NO:24) 

4. -(HEG)-(3')-d(DPDPDDPDPD)-(5') (SEQ ID NO:25) Example 9 

where HEG«hexaethyleneglycol linker, A«2'- 35 „ , . _ r , w KrnA ^ ^, , . • m /M v T 

deoxyadenosine, Thymidine, D»2-amino-2'- Synthesis of 5^MeNPOC-2^deoxyinosine.3HN,N- 

deoxyadenosine, and P-S-propynyl-Z-deoxyuridine. Each diisopropyl-2-cyanoethyl)phosphoramidite 

chip was then hybridized in a solution of a fluorescein- 2'-deoxyinosine (5.0 g, 20 mmole) was dissolved in 50 ml 

labeled oligodeoxynucleotide target, (5')-Fluorescein-d 0 f dry DMF, and 100 ml dry pyridine was added and 

(TATAITATA^)-(HEG)-d(GCGCGGCGCG)-(3 , ) (SEQ ID ^ evaporated three times to dry the solution. Another 50ml 

NO:26 and SEQ ID NO:27), which is complementary to pyridine was added, the solution was cooled to -20° C. 

both the A/T and G/C probes. The hybridization conditions unc j er M g 0n , and 13.8 g (50 mmole) of MeNPOC-chloride 

were: 10 nM target in 5x SSPE buffer at 22° C. with gentle in 20 ml dry DCM was then added dropwise with stirring 

shaking. After 3 hours, the chip was mounted on the flowcell over ^ minuteSt After 60 minutes, the cold bath was 

of a scanning laser confocal fluorescence microscope, rinsed remove d, and the solution was allowed to stir overnight at 

briefly with 5x SSPE buffer at 22° C, and then a surface room temperaturet Pyridine and DCM were removed by 

fluorescence image was ^obtained Hybridization i to the chip rati 5 00 ml of ethyl acetate was added, and the 

was continued overnight (total hybndization Ume=20hr), ' water afld ^ ^ brine 

and a surface fluorescence u^age was acqutfed again. washes wefe combined and 

The relaUve efficiency of hybridization of the target to the v™ * . r , 

ATT and substituted A/Tprobes was determined by compar- 50 back-extracted twice with ethyl acetate, and then all of the 

ing the average surface fluorescence intensity bound to those organic layers were combined, dried with Na 2 S0 and 

parts of the chip containing the A/T or substituted probe to evaporated under vacuum . The product was recrystaUized 

the fluorescence intensity bound to the G/C control probe from DCM to obtain 5.0 g (50% yield) of pure 5-0- 

sequence. The results (FIG. 5) show that 5-propynyl-dU and MeNPOC-2'-deoxyinosine as a yellow solid (99% purity, 

2-amino-dA substitution in an A/T-rich probe significantly 55 according to 3 H-NMR and HPLC analysis), 

enhances the affinity of an oligonucleotide analogue for Toe MeNPOC-nucleoside (2.5 g, 5.1 mmole) was sus- 

complementary target sequences. The unsubstiruted A/T- pended in 60 ml of dry CH 3 CN and phosphitylated with 

probe bound only 20% as much target as the all-G/C-probe 2-cyanoethyl-N,N,N',N , -tetraisopropylphosphorodiamidite 

of the same length, while the D- & P-substituted A/T probe ^ 65 ^ ^ nj. 5 5 mmo i e ) and 0.47 g (2.7 mmole) of 

bound nearly as much (90%) as the G/C-probe. Moreover, fi(J diisopropylammonium tetrazolide, according to the pub- 

the kinetics of hybridization are such that, at early times, the procec )ure of Barone, et al. (Nucleic Acids Res. (1984) 

amount of target bound to the substituted A/T probes n ^ crude p hosphoramidite was purified by 

exceeds that which is bound to the all-G/C probe. fiash cnromatography on si ii ca ge l (90:8:2 DCM-MeOH- 

Example 8 EtjN), co-evaporated twice with anhydrous acetonitrile and 

Hybridization to oligodeoxynucleotide probes substituted 65 dried under vacuum for -24 hours to obtain 2.8 g (80%) of 

with 7-deaza-2'-deoxyguanosine (ddG) and 2'-deoxyinosine the pure product as a yellow solid (98% purity as determined 

(dl) by J H/ 31 P-NMR and HPLC). 
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continued 



(ii) MOLECULE TYPE : RNA 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..20 

(D) OTHER INFORMATION: /note- "Target RNA sequence" 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
CUGAACGGUA GCAUCUUGAC 20 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "2 ' -O-methyl oligonucleotide" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjbase 

(B) LOCATION: 1 

(D) OTHER INFORMATION : /mod_base- cm 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 2 

(D) OTHER INFORMATION: /modjbase- urn 

(ix) FEATURE: 

(A) NAME/KEY: modified_base . 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /mod_base- gm 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjbase 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /mod_base- OTHER 
/note- "2 '-O-methyladenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjbase 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /mocLbase- OTHER 
/note- "2 * -O-methyladenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjaase 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /mocLbaee- cm 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjbase 

(B) LOCATION: 7 

(D) OTHER INFORMATION: /mod_base- gm 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjbase 

(B) LOCATION: 8 

(D) OTHER INFORMATION: /mod_base- gm 

(ix) FEATURE: 

(A) NAME /KEY: modified_base 

(B) LOCATION: 9 

(D) OTHER INFORMATION: /mod_base- urn 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mod_base- OTHER 
/note- "2 '-O-methyladenosine" 
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(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 11 

(D) OTHER INFORMATION: /mocLbase- gm 

(ix) FEATURE: 

(A) NAME/KEY: modified.baee 

(B) LOCATION: 12 

(D) OTHER INFORMATION: /mo debase- cm 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 13 

(D) OTHER INFORMATION: /mod-base- OTHER 
/note- "2 ' -O-methyladenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION I 14 

(D) OTHER INFORMATION: /mod_base- um 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: IS 

(D) OTHER INFORMATION: /motLbaee- cm 

(ix) FEATURE : 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 16 

(D) OTHER INFORMATION: /mod_base= urn 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 17 

(D) OTHER INFORMATION: /mod_base- um 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 18 

(D) OTHER INFORMATION: /mocLbase- gm 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 19 

(D) OTHER INFORMATION: /mocLbase- OTHER 
/note- "2 ' -O-methyladenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 20 

(D) OTHER INFORMATION: /mocLbase- cm 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..20 

(D) OTHER INFORMATION: /note- -Target 2 ' -O-methyl 
oligonucleotide sequence'' 

(xi) SEQUENCE DESCRIPTION: SEQ ID N0:9:. 

NNNNNNNNNN NNNNNNNNNN 20 




(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDS DNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..8 

(D) OTHER INFORMATION: /note- "Matching DNA oligonucleotide 
probe* 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CTTGCCAT 

(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "2'-0-raethyl oligonucleotide" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod_base- cm 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 2 

(D) OTHER INFORMATION: /mod_base- um 

(ix) FEATURE: 

(A) NAME /KEY: modified_base 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /mod_base= um 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /modjaase- gm 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /mod_base- cm 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /mod_base= cm 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 7 

(D) OTHER INFORMATION: /modjbase- OTHER 
/ note- "2 * -O-methyladenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 8 

(D) OTHER INFORMATION: /modjbase- um 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: . 1..8 

(D) OTHER INFORMATION: /note- -Matching 2 , -0-methyl 
oligonucleotide analogue probe" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 

NNNNNNNN 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(ix) FEATURE: 
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(A) NAME/KEY: - 

(B) LOCATION: 1..8 

(D) OTHER INFORMATION: /note- *DNA oligonucleotide probe 
with 1 base mismatch" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

CTTGCTAT 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc « "2 * -O-methyl oligonucleotide" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedLbase 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /modjbase- cm 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 2 

(D) OTHER INFORMATION: /mod_base~ um 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /mocLbase- um 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /mocLbase- gm 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /mod_Jbase= cm 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /mocLbase- um 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 7 

(D) OTHER INFORMATION: /mocLbase- OTHER 
/note- "2 ' -O-methyl adenosine" 

(ix) FEATURE: 

(A) NAME/KEY: .modifiecLbase 

(B) LOCATION: 8 

(D) OTHER INFORMATION: /mod_base- um 

(ix) FEATURE: 

(A) NAME/KEY: - 

(B) LOCATION: 1..8 

(D) OTHER INFORMATION: /note- *2' -O-methyl oligonucleotide 
analogue probe with 1 base mismatch" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:13: 

NNNNNNNN 



(2) INFORMATION FOR SEQ ID NO: 14: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 
(0 ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DKA 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N - cytosine covalently 
modified at the 3 ' phosphate group with 
a hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 

AAGATGNTAN 10 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/ KEY: modified_base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mod_base= OTHER 

/note- "N - cytosine covalently modified 
at the 3 ' phosphate group with a 
hexaethyleneglycol (HEG) linker* 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

AAGATNCTAN 10 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- *N - cytosine covalently modified 
at the 3 1 phosphate group with a 
hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:16: 

AAGANGCTAN 10 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjbase 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /modjsase- OTHER 

/note- "N - cytosine covalently modified 
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at the 3 ' phosphate group with a 
hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:17: 

AAGNTGCTAN 



(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDE DNES S : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /modjaase- OTHER 

/note- "N - cytosine covalently modified 
at the S ' phosphate group with a 
fluorescein molecule" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:18: 

NTGAACGGTA GCATCTTGAC 20 



(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 11 

(D) OTHER INFORMATION: /mod_base= OTHER 

/note- "N - adenine covalently modified 
at the 3* phosphate group with a 
hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:19: 

AAAAANAAAA N 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 11 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single. 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N - thymine covalently modified 
at the 5 ' phosphate group with a 
fluorescein molecule" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:20: 



NTTTTGTTTT T 



11 
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(2) INFORMATION FOR SEQ ID NO:21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/ KEY: modifiecLbase 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- *N - cytosine covalently modified 
at the 3 ' phosphate group with a 
hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 

CGCGCCGCGN 



(2) INFORMATION FOR SEQ ID NO:22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "2 ' -deoxy nucleoside /nucleo side 
analogue decanucleotide probe" 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mocLbaae- OTHER 
/note- "N - 2 '-deoxyadenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /mod_base= OTHER 
/note- "N - 2 '-deoxyadenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjbase 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /mod_base- OTHER 
/note- "N - 2 '-deoxyadenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /modjbase- OTHER 
/note= *N ■ 2 ' -deoxyadenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase . 

(B) LOCATION: 8 

(D) OTHER INFORMATION : /mocLbase- OTHER 
/note- "N - 2 '-deoxyadenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mocLbase- OTHER 

/note- "N - 2 ' -deoxyadenosine covalently 
modified at the 3* phosphate group with 
a hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 



NTNTNNTNTN 



10 



6,156,501 

43 44 

-continued 



(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 10 base pairs . 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE:! other nucleic acid 

(A) DESCRIPTION: /desc - "2 ' -deoxy nucleoside /nucleoside 
analogue decanucleotide probe" 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod_base« OTHER 
/note- "N » 2 '-deoxy adenosine" 



(ix) FEATURE: 

(A) NAME/KEY: modifiedjbase 

(B) LOCATION: 2 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N - 5-propynyl-2'-deoxyuridine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /modjbase- OTHER 
/note- "N - 2 ' -deoxyadenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjbase 

(B) LOCATION: 4 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N- S-propynyl-^'-deoxyuridine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /mod_base- OTHER 
/note- "N - 2 ' -deoxyadenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /modjbase- OTHER 
/note- *N - 2 ' -deoxyadenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 7 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N - 5-propynyl-2 ' -deoxyuridine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 8 

(D) OTHER INFORMATION: /mod_base- OTHER 
/note- "N - 2 ' -deoxyadenosine" 

(ix) FEATURE : . , 

(A) NAME/KEY: modified_base 

(B) LOCATION: 9 

(D) OTHER INFORMATION: /mod_base« OTHER 

/note- *N - 5-propynyl-2*-deoxyuridine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N - 2'-deoxyadenosine covalently 
modified at the 3 1 phosphate group with 
a hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 



NNNNNNNNNN 



10 
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(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDKDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "2 ' -deoxynucleoside/nucleoside 
analogue decanucleotide probe" 

(ix) FEATURE: 

(A) NAME/ KEY: modified_base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N - 2 -amino-2 '-deoxy adenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "H - 2-amino-2'-deoxyadenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_baae 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N - 2 -amino-2 '-deoxy adenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N = 2 -amino-2 1 -deoxy adenosine" 

(ix) FEATURE: .. 

(A) NAME/KEY: modified_base 

(B) LOCATION: 8 

(D) OTHER INFORMATION: /modjbaee- OTHER 

/note- "N - 2 -amino- 2 '-deoxy adenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mocLbase- OTHER 

/note- "N - 2 -amino-2 '-deoxy adenosine 
covalently modified at the 3' 
phosphate group with a 
hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

NTNTNNTNTN 



(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: other nucleic acid 

(A) DESCRIPTION: /desc - "2 ' -deoxynucleoside/nucleoside 
analogue decanucleotide probe" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N - 2 -amino-2 '-deoxy adenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 2 

(D) OTHER INFORMATION: /modjbase- OTHER 

/note- *N - 5-propynyl-2 ' -deoxyuridine" 
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(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 3 

(D) OTHER INFORMATION: /mod_baee- OTHER 

/note- "N - 2 -amino-2 '-deoxy adenosine" 

(ix) FEATURE: 

(A) NAME /KEY: modified_base 

(B) LOCATION: 4 

(D) OTHER INFORMATION : /modjaase- OTHER 

/note- "N - 5-propynyl-2' -deoxyuridine" 

(ix) FEATURE: 

(A) NAME/ KEY: modified_base 

(B) LOCATION: 5 

(D) OTHER INFORMATION: /modjbase- OTHER 

/note- "N - 2 -amino-2 '-deoxy adenosine" 

(ix) FEATURE: 

(A) NAME /KEY: modified_baee 

(B) LOCATION: 6 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N - 2 -amino-2 '-deoxy adenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 7 

(D) OTHER INFORMATION: /mocLbase- OTHER 

/note- *N « 5-propynyl-2 '-deoxyuridine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 8 

(D) OTHER INFORMATION: /modjbase- OTHER 

/note- *N - 2 -amino-2 '-deoxy adenosine" 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 9 

(D) OTHER INFORMATION: /modjbase- OTHER 

/note- *N » 5-propynyl-2* -deoxyuridine" 

(ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 10 

(D) OTHER INFORMATION: /mocLbaee- OTHER 

/note- "N - 2 -amino-2 '-deoxy adenosine 
covalently modified at the 3' 
phosphate group with a 
hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

NNNNNNNNNN 



(2) INFORMATION FOR SEQ ID NO: 26: 

.(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/ KEY: modified_base 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N - thymine covalently modified 
at the 5 ' hydroxyl group with a 
fluorescein molecule" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjaase 

(B) LOCATION: 10 

(D) OTHER INFORMATION : /modjbase- OTHER 
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/note- *N » thymine covalently modified 
at the 3 1 phosphate group with a 
hexaethyleneglycol (HEG) linker which is 
covalently bound to the 5" phosphate 
group. of the 5' guanine (N in pos. 1) of 
SEQ ID NO:27* 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 

NATATTATAN 10 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 1 

(D) OTHER INFORMATION : /modjsase- OTHER 

/note- "N - guanine covalently modified 
at the 5 ' phosphate group with a 
hexaethyleneglycol (HEG) linker which is 
covalently bound to the 3* phosphate 
group of the 3* thymine (N in pos. 10) 
of SEQ ID NO:26" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

NCGCGGCGCG 10 



(2) INFORMATION FOR SEQ ID NO:28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(ix) FEATURE: 

(A) NAME/KEY: modified_base 

(B) LOCATION: 6.. 10 

(D) OTHER INFORMATION: /mod_base- OTHER 
/note- "N - guanine (G), 
2 ' , 3 '-dideoxy guanine (ddG), 
2 ' -deoxyinosine (dl) or thymine (T)" 

(ix) FEATURE: 

(A) NAME/KEY: modifiedjbase 

(B) LOCATION: 15 

(D) OTHER INFORMATION : /mod_base- OTHER 

/note- *N - cytosine covalently modified 
at the 5' phosphate group with a 
hexaethyleneglycol (HEG) linker" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:28: 

TGGGCNNNNN TTGTN 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



6,156,501 

-continued 



ks(ii) MOLECULE TYPE: DNA 

■ (ix) FEATURE: 

(A) NAME/KEY: modifiecLbase 

(B) LOCATION: 1 

(D) OTHER INFORMATION: /mod_base- OTHER 

/note- "N - cytosine covalently modified 
at the 5 ' phosphate group with a 
fluorescein molecule" 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

NAATACAACC CCCGCCCATC C 21 



What is claimed is: 

1. A composition for analyzing interactions between oli- 
gonucleotide targets and oligonucleotide probes comprising 

an array of a plurality of oligonucleotide analogue probes 20 
having different sequences, wherein said oligonucleotide 
analogue probes are coupled to a solid substrate at known 
locations and wherein said plurality of oligonucleotide ana- 
logue probes are selected to bind to complementary oligo- 
nucleotide targets with a similar hybridization stability 25 
across the array. 

2. The composition of claim 1, wherein at least one of said 
oligonucleotide analogue probes is selected to maintain 
hybridization specificity or mismatch discrimination with 
said complementary oligonucleotide targets. 30 

3. The composition of claim 1, wherein at least one of said 
oligonucleotide analogue probes has increased the thermal 
stability between said oligonucleotide analogue probe and 
said complementary oligonucleotide target as compared to 

an oligonucleotide probe that is the perfect complement to 35 
the complementary oligonucleotide target with which said 
oligonucleotide analogue probe anneals. 

4. The composition of claim 1, wherein at least one of said 
oligonucleotide analogue probes has decreased the thermal 
stability between said oligonucleotide analogue probe and 40 
said complementary oligonucleotide target as compared to 

an oligonucleotide probe that is the perfect complement to 
the complementary oligonucleotide target with which said 
oligonucleotide analogue probe anneals. 

5. The composition of claim 2, wherein at least one of said 45 
oligonucleotide analogue probes has increased the thermal 
stability between said oligonucleotide analogue probe and 
said complementary oligonucleotide target as compared to 

an oligonucleotide probe that is the perfect complement to 
the complementary oligonucleotide target with which said 50 
oligonucleotide analogue probe anneals. 

6. The composition of claim 2, wherein at least one of said 
oligonucleotide analogue probes has decreased the thermal 
stability between said oligonucleotide analogue probe and 
said complementary oligonucleotide target as compared to 55 
an oligonucleotide probe that is the perfect complement to 
the complementary oligonucleotide target with which said 
oligonucleotide analogue probe anneals. 

7. The composition of claims 1-5 or 6, wherein said solid 
substrate is selected from the group consisting of silica, 60 
polymeric materials, glass, beads, chips, and slides. 

8. The composition of claims 1-5 or 6, wherein said 
composition comprises an array of oligonucleotide analogue 
probes 5 to 20 nucleotides in length. 

9. The composition of claims 1-5 or 6, wherein said array 65 
of oligonucleotide analogue probes comprises a nucleoside 
analogue with the formula 




the nucleoside analogue is not a naturally occurring DNA 

or RNA nucleoside; 
R 1 is selected from the group consisting of hydrogen, 

methyl, hydroxyl, alkoxy, alkythio, halogen, cyano, 

and azido; 

R 2 is selected from the group consisting of hydrogen, 
methyl, hydroxyl, alkoxy, alkythio, halogen, cyano,. 
and azido; 

Y is a heterocyclic moiety; 

and wherein said nucleoside analogue is incorporated into 
the oligonucleotide analogue by attachment to a 3' 
hydroxyl of the nucleoside analogue, to a 5' hydroxyl of 
the nucleoside analogue, or both the 3' nucleoside and 
the 5' hydroxyl of the nucleoside analogue. 

10. The composition of claims 1-5 or 6, wherein said 
array of 

oligonucleotide analogue probes comprises a nucleoside 
analogue with the formula 




wherein: 

the nucleoside analogue is not a naturally occurring DNA 
or RNA nucleoside; 

R 1 is selected from the group consisting of hydrogen, 
hydroxyl, methyl, methoxy, ethoxy, propoxy, allyloxy, 
propargyloxy, Fluorine, Chlorine, and Bromine; 

R 2 is selected from the group consisting of hydrogen, 
hydroxyl, methyl, methoxy, ethoxy, propoxy, allyloxy, 
propargyloxy, Fluorine, Chlorine, and Bromine; and 
Y is a base selected from the group consisting of 
purines, purine analogues pyrimidines, pyrimidine 
analogues, 3-nitropyrrole and 5-nitroindole; 

and wherein said nucleoside analogue is incorporated into 
the oligonucleotide analogue by attachment to a 3' 
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hydroxyl of the nucleoside analogue, to a 5* hydroxyl of 
the nucleoside analogue, or both the 3* nucleoside and 
the 5' hydroxyl of the nucleoside analogue. 
LL The composition of claims 1-5 or 6, wherein each 
probe of said plurality of oligonucleotide analogue probes 
has at least one oligonucleotide analogue, and wherein at 
least one of said oligonucleotide analogues comprises a 
peptide nucleic acid. 

12. The composition of claims 1-5 or 6, wherein at least 
one of said plurality of oligonucleotide analogue probes said 
array of oligonucleotide analogue probes is resistant to 
RNAase A. 

13. The composition of claims 1-5 or 6, wherein said 
solid substrate is attached to over 1000 different oligonucle- 
otide analogue probes. 

14. The composition of claims 1-5 or 6, wherein each 15 
probe of said plurality of oligonucleotide analogue probes 
has at least one oligonucleotide analogue, and wherein at 
least one of said oligonucleotide analogues comprises 2'-0- 
methyl nucleotides. 

15. The composition of claims 1-5 or 6, wherein said 20 
array of oligonucleotide analogue probes and said solid 
substrate comprises a plurality of different oligonucleotide 
analogue probes, each oligonucleotide analogue probes hav- 
ing the formula: 

Y — L 1 — X 1 — L 2 — X 2 
wherein, 

Y is a solid substrate; 

X 1 and X 2 are complementary oligonucleotides contain- 3Q 

ing at least one nucleotide analogue; 
L 1 is a spacer; . 
; L 2 is a linking group having sufficient length such that X 1 

and X 2 form a double-stranded oligonucleotide. 

16. The composition of claim 15, wherein said composi- ^ 
tion comprises a library of unimolecular double-stranded 
oligonucleotide analogue probes. 

17. The composition of claims 1-5 or 6, wherein said 
array of oligonucleotide analogue probes comprises a con- 
formationally restricted array of oligonucleotide analogue 
probes with the formula: 40 

_x"— z—x 12 

wherein X 11 and X 12 are complementary oligonucleotides 
or oligonucleotide analogues and Z is a presented 45 
moiety. 

18. The composition of claims 1-5 or 6, wherein each 
probe of said plurality of oligonucleotide analogue probes 
has at least one oligonucleotide analogue, and wherein at 
least one of said oligonucleotide analogues comprises a 50 
nucleotide with a base selected from the group of bases 
consisting of 5-propynyluracil, 5-propynylcytosine, 
2-aminoadenine, 7-deazaguanine, 2-aminopurine, 8-aza-7- 
deazaguanine, lH-purine, and hypoxanthine. 

19. The composition of claims 1-5 or 6, wherein said 55 
plurality of oligonucleotide analogue probes are coupled to 
said solid substrate by light-directed chemical coupling. 

20. The composition of claim 19, wherein said solid 
substrate is derivitized with a silane reagent prior to syn- 
thesis of said plurality of oligonucleotide analogue probes. 60 

21. The composition of claims 1-5 or 6, wherein said 
plurality of oligonucleotide analogue probes are coupled to 
said solid substrate by flowing oligonucleotide analogue 
reagents over known locations of the solid substrate. 

22. The composition of claim 21, wherein said solid 65 
substrate is derivitized with a silane reagent prior to syn- 
thesis of said plurality of oligonucleotide analogue probes. 



54 

23. The composition of claims 1-5 or 6, wherein at least 
one of plurality of said oligonucleotide analogue probes 
forms a first duplex with a target oligonucleotide sequence, 
wherein said oligonucleotide analogue probe has a corre- 

5 spending oligonucleotide sequence that forms a second 
duplex with said target oligonucleotide sequence, wherein 
said second duplex is rich in A-T or G-C nucleotide pairs, 
and wherein said oligonucleotide analogue probe has at least 
one nucleotide analogue in place of an A, T, G, or C 
nucleotide of said corresponding oligonucleotide sequence 
at a position within said oligonucleotide analogue probe 
such that said first duplex has an increased hybridization 
stability than said second duplex. 

24. The composition of claim 23, wherein said oligo- 
nucleotide analogue probe contains fewer bases than said 
corresponding oligonucleotide sequence. 

25. The composition of claims 1-5 or 6, wherein said 
oligonucleotide analogue probe forms a first duplex with a 
target oligonucleotide sequence, wherein said oligonucle- 
otide analogue probe has a corresponding oligonucleotide 
sequence that forms a second duplex with said target poly- 
nucleotide sequence, and wherein said oligonucleotide ana- 
logue probe is shorter than said corresponding polynucle- 
otide sequence. 

26. A composition for analyzing the interaction between 
an oligonucleotide target and an oligonucleotide probe com- 
prising an array of a plurality of oligonucleotide probes 
having different sequences hybridized to complementary 
oligonucleotide analogue targets, wherein said oligonucle- 
otide analogue targets bind to complementary oligonucle- 
otide probes with a similar hybridization stability across the 
array. 

27. The composition of claim 26, wherein at least one of 
said oligonucleotide analogue target is selected to maintain 
hybridization specificity or mismatch discrimination with 
said complementary oligonucleotide probes. 

28. The composition of claim 26, wherein at least one of 
said oligonucleotide analogue targets has increased the 
thermal stability between said oligonucleotide analogue 
target and said complementary oligonucleotide probe as 
compared to an oligonucleotide target that is the perfect 
complement to the complementary oligonucleotide probe 
with which said oligonucleotide analogue target anneals. 

29. The composition of claim 26, wherein at least one of 
said oligonucleotide analogue targets has decreased the 
thermal stability between said oligonucleotide analogue 
target and said complementary oligonucleotide probe as 
compared to an oligonucleotide target that is the perfect 
complement to the complementary oligonucleotide probe 
with which said oligonucleotide analogue target anneals. 

30. The composition of claim 27, wherein at least one of 
said oligonucleotide analogue targets has increased the 
thermal stability between said oligonucleotide analogue 
target and said complementary oligonucleotide probe as 
compared to an oligonucleotide target that is the perfect 
complement to the complementary oligonucleotide probe 
with which said oligonucleotide analogue target anneals. 

31. The composition of claim 27, wherein at least one of 
said oligonucleotide analogue targets has decreased the 
thermal stability between said oligonucleotide analogue 
target and said complementary oligonucleotide probe as 
compared to an oligonucleotide target that is the perfect 
complement to the complementary oligonucleotide probe 
with which said oligonucleotide analogue target anneals. 

32. The composition of claims 26-30 or 31, wherein the 
oligonucleotide analogue target is a PCR amplicon. 

33. The composition of claims 26-30 or 31, wherein at 
least one of said plurality of oligonucleotide probes com- 
prise at least one oligonucleotide analogue. 
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34. The composition of claims 26-30 or 31, wherein at sized on said solid support by causing oligonucleotide 
least one target oligonucleotide analogue acid is an RNA analogue synthetic reagents to flow over known locations of 
nucleic acid. said solid support. 

35. A method analyzing interactions between an oligo- 45. The method of claims 35-39 or 40, wherein said step 
nucleotide target and an oligonucleotide probe, comprising 5 (a), comprises the steps of: 

the steps of: i). forming a plurality of channels adjacent to the surface 

(a) , synthesizing an oligonucleotide analogue array com- of said substrate; 

prising a plurality of oligonucleotide analogue probes ii). placing selected reagents in said channels to synthe- 

having different sequences, wherein said oligonucle- size oligonucleotide analogue probes at known loca- 

otide analogue probes are coupled to a solid substrate 10 tions; and 

at known locations, said solid substrate having a sur- iii). repeating steps i). and ii). thereby forming an array of 

f ace . oligonucleotide analogue probes having different 

(b) . exposing said oligonucleotide analogue probe array to sequences at known locations on said substrate. 

a plurality of oligonucleotide targets under hybridiza- 46. The method of claims 3M9 or 40, wherein said sohd 

tion conditions such that said plurality of oligonucle- « s ^ slrate » 5elected from the grOUp 00051511116 ° f bewb ' 

otide analogue probes bind to complementary oligo- s ^ ofdaims3 „ 9 or ^ wherein saidsolid 

nucleotide targets with a similar hybridization stability ^ comprised of materiaIs from ^ group 

across the array; and consisting of silica, polymers and glass. 

(c) . determining whether an oligonucleotide analogue 48 met h od 0 f c i amls 35.39 or 40, wherein the 
probe of said oligonucleotide analogue probe array 20 oligonucleotide analogue probes of said array are synthe- 
binds to at least one of said target nucleic acids. sized using photoremovable protecting groups. 

36. The method in accordance of claim 35, wherein at 49. The method of claims 35-39 or 40, further comprising 
least one of said oligonucleotide analogue probes is selected selectively incorporating MeNPoc onto the 3' or 5' hydroxyl 
to maintain hybridization specificity or mismatch discrimi- of at least one nucleoside analogue and selectively incorpo- 
nation with said complementary oligonucleotide targets. 25 rating said nucleoside analogue into at least one of said 

37. The method in accordance of claim 35, wherein at oligonucleotide analogue probes. 

least one of said oligonucleotide analogue probes has 50. The method of claims 35-39 or 40, wherein at least 

increased the thermal stability between said oligonucleotide one of said oligonucleotide analogue probes is synthesized 

analogue probe and said complementary oligonucleotide from phosphoramidite nucleoside reagents, 

target as compared to an oligonucleotide probe that is the 30 51. A method of detecting an oligonucleotide target, 

perfect complement to the complementary oligonucleotide comprising enzymatically copying an oligonucleotide target 

target with which said oligonucleotide analogue probe using at least one nucleotide analogue, thereby producing 

anneals. multiple oligonucleotide analogue targets, selecting said 

38. The method in accordance of claim 35, wherein at oligonucleotide analogue targets such that said oligomicle- 
least one of said oligonucleotide analogue probes has 35 otide analogue targets bind to the complementary oligo- 
decreased the thermal stability between said oligonucleotide nucleotide probes coupled to a solid surface at known 
analogue probe and said complementary oligonucleotide locations of an array with a similar hybridization stability 
target as compared to an oligonucleotide probe that is the across the array, hybridizing the oligonucleotide analogue 
perfect complement to the complementary oligonucleotide targets to complementary oligonucleotide probes, and 
target with which said oligonucleotide analogue probe 40 detecting whether at least one of said oligonuclotide ana- 
anneals, logue targets binds to said complementary oligonucleotide 

39. The method in accordance of claim 36, wherein at acid probe. 

least one of said oligonucleotide analogue probes has 52. The method of claim 51, wherein at least one of said 

increased the thermal stability between said oligonucleotide oligonucleotide analogue targets is selected to maintain 

analogue probe and said complementary oligonucleotide 45 hybridization specificity or mismatch discrimination with 

target as compared to an oligonucleotide probe that is the said complementary oligonucleotide probes, 

perfect complement to the complementary oligonucleotide 53. The method of claim 51, wherein at least one of said 

target with which said oligonucleotide analogue probe oligonucleotide analogue targets has increased the thermal 

anneals. stability between said oligonucleotide analogue target and 

40. The method in accordance of claim 36, wherein at 50 said complementary oligonucleotide probe as compared to 
least one of said oligonucleotide analogue probes has an oligonucleotide target that is the perfect complement to 
decreased the thermal stability between said oligonucleotide the complementary oligonucleotide probe with which" said 
analogue probe and said complementary oligonucleotide oligonucleotide analogue target anneals. 

target as compared to an oligonucleotide probe that is the 54. The method of claim 51, wherein at least one of said 

perfect complement to the complementary oligonucleotide 55 oligonucleotide analogue targets has decreased the thermal 

target with which said oligonucleotide analogue probe stability between said oligonucleotide analogue target and 

anneals. said complementary oligonucleotide probe as compared to 

41. The method of claims 35-39 or 40, wherein said an oligonucleotide target that is the perfect complement to 
oligonucleotide target is selected from the group comprising the complementary oligonucleotide probe with which said 
genomic DNA, cDNA, unspliced RNA, mRNA, and rRNA. 60 oligonucleotide analogue target anneals. 

42. The method of claims 35-39 or 40, wherein said target 55. The method of claim 52, wherein at least one of said 
nucleic acid is amplified prior to said hybridization step. oligonucleotide analogue targets has increased the thermal 

43. The method of claims 35-39 or 40, wherein said stability between said oligonucleotide analogue target and 
plurality of oligonucleotide analogue probes is synthesized said complementary oligonucleotide probe as compared to 
on said solid support by light-directed synthesis. 65 an oligonucleotide target that is the perfect complement to 

44. The method of claims 35-39 or 40, wherein said the complementary oligonucleotide probe with which said 
plurality of said oligonucleotide analogue probes is synthe- oligonucleotide analogue target anneals. 
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56. The method of claim 52, wherein at least one of said 
oligonucleotide analogue targets has decreased the thermal 
stability between said oligonucleotide analogue target and 
said complementary oligonucleotide probe as compared to 
an oligonucleotide target that is the perfect complement to 5 
the complementary oligonucleotide probe with which said 
oligonucleotide analogue target anneals. 

57. The method of claims 51-55 or 56, wherein the 
oligonucleotide probe array comprises at least one oligo- 
nucleotide analogue probe which is complementary to at 10 
least one of said oligonucleotide analogue targets. 

58. A method of making an array of oligonucleotide 
probes, comprising providing a plurality of oligonucleotide 
analogue probes having at least one oligonucleotide 
analogue, said oligonucleotide analogue probes having dif- 15 
ferent sequences at known locations on an array, selecting 
the oligonucleotide analogue probes to hybridize with 
complementary oligonucleotide target sequences under 
hybridization conditions such that said oligonucleotide ana- 
logue probes bind to complementary oligonucleotide targets 20 
with a similar hybridization stability, across the array. 

59. The method of claim 58, wherein at least one of said 
oligonucleotide analogue probes is selected to maintain 
hybridization specificity or mismatch discrimination with 
said complementary oligonucleotide targets. 25 

60. The method of claim 58, wherein at least one of said 
oligonucleotide analogue probes has increased the thermal 
stability between said oligonucleotide analogue probe and 
said complementary oligonucleotide target as compared to 

an oligonucleotide probe that is the perfect complement to 30 
the complementary oligonucleotide target with which said 
oligonucleotide analogue probe anneals. 

61. The method of claim 58, wherein at least one of said 
oligonucleotide analogue probes has decreased the thermal 
stability between said oligonucleotide analogue probe and 35 
said complementary oligonucleotide target as compared to 

an oligonucleotide probe that is the perfect complement to 
the complementary oligonucleotide target with which said 
oligonucleotide analogue probe anneals. 

62. The method of claim 59, wherein at least one of said 40 
oligonucleotide analogue probes has increased the thermal 
stability between said oligonucleotide analogue probe and 
said complementary oligonucleotide target as compared to 

an oligonucleotide probe that is the perfect complement to 
the complementary oligonucleotide target with which said 45 
oligonucleotide analogue probe anneals. 

63. The method of claim 59, wherein at least one of said 
oligonucleotide analogue probes has decreased the thermal 
stability between said oligonucleotide analogue probe and 
said complementary oligonucleotide target as compared to 50 
an oligonucleotide probe that is the perfect complement to 
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the complementary oligonucleotide target with which said 
oligonucleotide analogue probe anneals. 

64. The method in accordance with claims 58-62, or 63, 
further comprising incorporating at least one oligonucle- 
otide analogue into at least one of the oligonucleotide 
analogue probes of the . array to reduce or prevent the 
formation of secondary structure in the oligonucleotide of 
the array, 

65. The method in accordance with claims 58-62, or 63, 
further comprising incorporating at least one oligonucle- 
otide analogue into at least one of the oligonucleotide target 
to reduce or prevent the formation of secondary structure in 
the target polynucleotide sequence. 

66. The method in accordance with claims 58-62, or 63, 
further comprising incorporating at least one oligonucle- 
otide analogue into at least one of the oligonucleotide 
analogue probes of the array to create secondary structure in 
the oligonucleotide of the array. 

67. The method in accordance with claims 58-62, or 63, 
further comprising incorporating a base selected from the 
group consisting of 5-propynyluracil, 5-propynylcytosine, 
2-aminoadenine, 7-deazaguanine, 2-aminopurine, 8-aza-7- 
deazaguanine, lH-purine, and hypoxanthine into the oligo- 
nucleotide analogue probes of the array. 

68. The method of claim 67 further comprising selecting 
said at least one oligonucleotide analogue such that the 
oligonucleotide analogue probe is a homopolymer. 

69. The method in accordance with claims 58-62, or 63, 
further comprising selecting said at least one oligonucleotide 
analogue from the group consisting essentially of oligo- 
nucleotide analogues comprising 2'-0-methyl nucleotides 
and oligonucleo tides comprising a base selected from the 
group of bases consisting of 5 -propynyluracil, 
5-propynylcytosine, 7-deazaguanine, 2-aminoadenine, 
8 -aza-7-deaza guanine, lH-purine, and hypoxanthine. 

70. The method in accordance with claims 58-62 or 63, 
further comprising selecting said at least one oligonucleotide 
analogue such that oligonucleotide analogue probes com- 
prises at least one peptide nucleic acid. 

71. The method in accordance with claims 58-62, or 63, 
further comprising selecting said at least one oligonucleotide 
analogue to increase image brightness when the oligonucle- 
otide target and the oligonucleotide analogue probe hybrid- 
ize in the presence of a fluorescent indicator, in comparison 
to a oligonucleotide probe without oligonucleotide analogs. 

72. The method in accordance with claims 58-62, or 63, 
further comprising providing said plurality of oligonucle- 
otide analogue probes in an array with at least 1000 other 
oligonucleotide analogue probes. 
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The present inventions relate to the synthesis and place- 
ment materials at known locations. In particular, one 
embodiment of the inventions provides a method and asso- 
ciated apparatus for preparing diverse chemical sequences at 
known locations on a single substrate surface. The inven- ^ 
tions may be applied, for example, in the field of preparation 
of oligomer, peptide, nucleic acid, oligosaccharide, 
phospholipid, polymer, or drug congener preparation, espe- 
cially to create sources of chemical diversity for use in 
screening for biological activity. 35 

The relationship between structure and activity of mol- 
ecules is a fundamental issue in the study of biological 
systems. Structure-activity relationships are important in 
understanding, for example, the function of enzymes, the 
ways in which cells communicate with each other, as well as 40 
cellular control and feedback systems. 

Certain macromolecules are known to interact and bind to 
other molecules having a very specific three-dimensional 
spatial and electronic distribution. Any large molecule hav- 
ing such specificity can be considered a receptor, whether it 45 
is an enzyme catalyzing hydrolysis of a metabolic 
intermediate, a cell-surface protein mediating membrane 
transport of ions, a glycoprotein serving to identify a par- 
ticular cell to its neighbors, an IgG -class antibody circulat- 
ing in the plasma, an oligonucleotide sequence of DNA in 50 
the nucleus, or the like. The various molecules which 
receptors selectively bind are known as ligands: 

Many assays are available for measuring the binding 
affinity of known receptors and ligands, but the information 
which can be gained from such experiments is often limited 55 
by the number and type of ligands which are available. 
Novel ligands are sometimes discovered by chance or by 
application of new techniques for the elucidation of molecu- 
lar structure, including x-ray crystallographic analysis and 
recombinant genetic techniques for proteins. 60 

Small peptides are an exemplary system for exploring the 
relationship between structure and function in biology. A 
peptide is a sequence of amino acids. When the twenty 
naturally occurring amino acids are condensed into poly- 
meric molecules they form a wide variety of three- 65 
dimensional configurations, each resulting from a particular 
amino acid sequence and solvent condition. The number of 



possible pentapeptides of the 20 naturally occurring amino 
acids, for example, is 20 5 or 3.2 million different peptides. 
The likelihood that molecules of this size might be useful in 
receptor-binding studies is supported by epitope analysis 
studies showing that some antibodies recognize sequences 
as short , as a few amino acids with high specificity. 
Furthermore, the average molecular weight of amino acids 
puts small peptides in the size range of many currently 
useful pharmaceutical products. 

Pharmaceutical drug discovery is one type of research 
which relies on such a study of structure-activity relation- 
ships. In most cases, contemporary pharmaceutical research 
can be described as the process of discovering novel ligands 
with desirable patterns of specificity for biologically impor- 
tant receptors. Another example is research to discover new 
compounds for use in agriculture, such as pesticides and 
herbicides. 

Sometimes, the solution to a rational process of designing 
ligands is difficult or unyielding. Prior methods of preparing 
large numbers of different polymers have been painstakingly 
slow when used at a scale sufficient to permit effective 
rational or random screening. For example, the "Merrifield" 
method (/. Am. Chem. Soc. (1963) 85:2149-2154, which is 
incorporated herein by reference for all purposes) has been 
used to synthesize peptides on a solid support. In the 
Merrifield method, an amino acid is covalently bonded to a 
support made of an insoluble polymer. Another amino acid 
with an alpha protected group is reacted with the covalently 
bonded amino acid to form a dipeptide. After washing, the 
protective group is removed and a third amino acid with an 
alpha protective group is added to the dipeptide. This 
process is continued until a peptide of a desired length and 
sequence is obtained. Using the Merrifield method, it is not 
economically practical to synthesize more than a handful of 
peptide sequences in a day. 

To synthesize larger numbers of polymer sequences, it has 
also been proposed to use a series of reaction vessels for 
polymer synthesis. For example, a tubular reactor system 
may be used to synthesize a linear polymer on a solid phase 
support by automated sequential addition of reagents. This 
method still does not enable the synthesis of a sufficiently 
large number of polymer sequences for effective economical 
screening. 

Methods of preparing a plurality of polymer sequences 
are also known in which a foraminous container encloses a 
known quantity of reactive particles, the particles being 
larger in size than foramina of the container. The containers 
may be selectively reacted with desired materials to synthe- 
size desired sequences of product molecules. As with other 
methods known in the art, this method cannot practically be 
used to synthesize a sufficient variety of polypeptides for \ 
effective screening. 

Other techniques have also been described. These meth- 
ods include the synthesis of peptides on 96 plastic pins 
which fit the format of standard microliter plates. 
Unfortunately, while these techniques have been somewhat 
useful, substantial problems remain. For example, these 
methods continue to be limited in the diversity of sequences 
which-can be economically synthesized and screened. 

From the above, it is seen that an improved method and 
apparatus for synthesizing a variety of chemical sequences 
at known locations is desired. 

SUMMARY OF THE INVENTION 

An improved method and apparatus for the preparation of 
a variety of polymers is disclosed. 
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In one preferred embodiment, linker molecules are pro- 
vided on a substrate. A terminal end of the linker molecules 
is provided with a reactive functional group protected with 
a photoremovable protective group. Using lithographic 
methods, the photoremovable protective group, is exposed to 
light and removed from the linker molecules in first selected 
regions. The substrate is then washed or otherwise contacted 
with a first monomer that reacts with exposed functional 
groups on the linker molecules. In a preferred embodiment, 
the monomer is an amino acid containing a photoremovable 
protective group at its amino or carboxy terminus and the 
linker molecule terminates in an amino or carboxy acid 
group bearing a photoremovable protective group. 

Asecond set of selected regions is, thereafter, exposed to 
light and the photoremovable protective group on the linker 
molecule/protected amino acid is removed at the second set 
of regions. The substrate is then contacted with a second 
monomer containing a photoremovable protective group for 
reaction with exposed functional groups. This process is 
repeated to selectively apply monomers until polymers of a 
desired length and desired chemical sequence are obtained. 
Photolabile groups are then optionally removed and the 
sequence is, thereafter, optionally capped. Side chain pro- 
tective groups, if present, are also removed. 

By using the lithographic techniques disclosed herein, it 
is possible to direct light to relatively small and precisely 
known locations on the substrate. It is, therefore, possible to 
synthesize polymers of a known chemical sequence at 
known locations on the substrate. 

The resulting substrate will have a variety of uses 
including, for example, screening large numbers of poly- 
mers for biological activity. To screen for biological activity, 
the substrate is exposed to one or more receptors such as 
antibody whole cells, receptors on vesicles, lipids, or any 
one of a variety of other receptors. The receptors are 
preferably labeled with, for example, a fluorescent marker, 
radioactive marker, or a labeled antibody reactive with the 
receptor. The location of the marker on the substrate is 
detected with, for example, photon detection or autoradio- 
graphic techniques. Through knowledge of the sequence of 
the material at the location where binding is detected, it is 
possible to quickly determine which sequence binds with the 
receptor and, therefore, the technique can be used to screen 
large numbers of peptides. Other possible applications of the 
inventions herein include diagnostics in which various anti- 
bodies for particular receptors would be placed on a sub- 
strate and, for example, blood sera would be screened for 
immune deficiencies. Still further applications include, for 
example, selective "doping" of organic materials in semi- 
conductor devices, and the like. 

In connection with one aspect of the invention an 
improved reactor system for synthesizing polymers is also 
disclosed. The reactor system includes a substrate mount 
which engages a substrate around a periphery thereof. The 
substrate mount provides for a reactor space between the 
substrate and the mount through or into which reaction fluids 
are pumped or flowed. A mask is placed on or focused on the 
substrate and illuminated so as to deprotect selected regions 
of the substrate in the reactor space. A monomer is pumped 
through the reactor space or otherwise contacted with the 
substrate and reacts with the deprotected regions. By selec- 
tively deprotecting regions on the substrate and flowing 
predetermined monomers through the reactor space, desired 
polymers at known locations may be synthesized. 

Improved detection apparatus and methods are also dis- 
closed. The detection method and apparatus utilize a sub- 
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strate having a large variety of polymer sequences at known 
locations on a surface thereof. The substrate is exposed to a 
fluorescently labeled receptor which binds to one or more of 
the polymer sequences. The substrate is placed in a micro- 
scope detection apparatus for identification of locations 
where binding takes place. The microscope detection appa- 
ratus includes a monochromatic or . polychromatic light 
source for directing light at the substrate, means for detect- 
ing fluoresced light from the substrate, and means for 
determining a location of the fluoresced light. The means for 
detecting light fluoresced on the substrate may in some 
embodiments include a photon counter. The means for 
determining a location of the fluoresced light may include an 
x/y translation table for the substrate. Translation of the slide 
and data collection are recorded and managed by an appro- 
priately programmed digital computer. 

A further understanding of the nature and advantages of 
the inventions herein may be realized by reference to the 
remaining portions of the specification and the attached 
drawings. 

BRIEF DESCRIPTION OF THE FIGURES 

FIG. 1 illustrates masking and irradiation of a substrate at 
a first location. The substrate is shown in cross-section; 

FIG. 2 illustrates the substrate after application of a 
monomer "A"; 

FIG. 3 illustrates irradiation of the substrate at a second 
location; 

FIG. 4 illustrates the substrate after application of mono- 
mer "B"; 

FIG. 5 illustrates irradiation of the "A" monomer; 
FIG. 6 illustrates the substrate after a second application 
of "B"; 

FIG. 7 illustrates a completed substrate; 

FIGS. 8 A and 8B illustrate alternative embodiments of a 
reactor system for forming a plurality of polymers on a 
substrate; 

FIG. 9 illustrates a detection apparatus for locating fluo- 
rescent markers on the substrate; 

FIGS. 10A-10M illustrate the method as it is applied to 
the production of the trimers of monomers "A" and "B"; 

FIGS. 11A and 11B are fluorescence traces for standard 
fluorescent beads; 

FIGS. 12A and 12B are fluorescence curves for NVOC 
slides not exposed and exposed to light respectively; 

FIGS. 13 A to 13D are fluorescence plots of slides exposed 
through 100 /*m, 50 //m, 20 pan, and 10 /mi masks; 

FIG. 14A and 14B illustrates fluorescence of a slide pith 
the peptide YGGFLon selected regions of- its surface which 
has been exposed to labeled Herz antibody specific for this 
sequence; 

FIGS. 15A and 15D illustrate formation of and a fluores- 
cence plot of a slide with a checkerboard pattern of YGGFL 
and GGFL exposed to labeled Herz antibody. FIG. 15A 
illustrates a 500x500 ftm mask which has been focused on 
the substrate according to FIG. 8 A while FIG. 15B illustrates 
a 50x50 /oti mask placed in direct contact with the substrate 
in accord with FIG. 8B; 

FIG. 16 is a fluorescence plot of YGGFL and PGGFL 
synthesized in a 50 checkerboard pattern; 

FIG. 17 is a fluorescence plot of YPGGFL and is YGGFL 
synthesized in a 50 fim checkerboard pattern; 

FIGS. 18A and 18B illustrate the mapping of sixteen 
sequences synthesized on two different glass slides; 
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FIG. 19 is a fluorescence plot of the slide illustrated in 
FIG. 18A; and 

FIG. 20 is a fluorescence plot of the slide illustrated in 
FIG. 10B. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 
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I. Glossary 45 

The following terms are intended to have the following 
general meanings as they are used herein: 

1. Complementary: Refers to the topological compatibility 
or matching together of interacting surfaces of a ligand 
molecule and its receptor. Thus, the receptor and its ligand 50 
can be described as complementary, and furthermore, the 
contact surface characteristics are complementary to each 
other. 

2. Epitope: The portion of an antigen molecule which is 
delineated by the area of interaction with the subclass of 55 
receptors known as antibodies. 

3. Ligand: A ligand is a molecule that is recognized by a 
particular receptor. Examples of ligands that can be inves- 
tigated by this invention include, but are not restricted to, 
agonists and antagonists for cell membrane receptors, 60 
toxins and venoms, viral epitopes, hormones (e.g., 
opiates, steroids, etc.), hormone receptors, peptides, 
enzymes, enzyme substrates, cofactors, drugs, lectins, 
sugars, oligonucleotides, nucleic acids, oligosaccharides, 
proteins, and monoclonal antibodies. 65 

4. Monomer: A member of the set of small molecules which 
can be joined together to form a polymer. The set of 



monomers includes but is not restricted to, for example, 
the set of common L-amino acids, the set of D-amino 
acids, the set of synthetic amino acids, the set of nucle- 
otides and the set of pentoses and hexoses. As used herein, 
monomers refers to any member of a basis set for syn- 
thesis of a polymer. For example, dimers of L-amino acids 
form a basis set of 400 monomers for synthesis of 
polypeptides. Different basis sets of monomers may be 
used at successive steps in the synthesis of a polymer. 

5. Peptide: A polymer in which the monomers are alpha 
amino acids and which are joined together through amide 
bonds and alternatively referred to as a polypeptide. In the 
context of this specification it should be appreciated that 
the amino acids may be the L-optical isomer or the 
D-optical isomer. Peptides are more than two amino acid 
monomers long, and often more than 20 amino acid 
monomers long. Standard abbreviations for amino acids 
are used (e.g., P for proline). These abbreviations are 
included in Stryer, Biochemstry, Third Ed., 1988, which is 
incorporated herein by reference for all purposes. 

6. Radiation: Energy which may be selectively applied 
including energy having a wavelength of between 10~ 14 
and 10 4 meters including, for example, electron beam 
radiation, gamma radiation, x-ray radiation, ultra-violet 
radiation, visible light, infrared radiation, microwave 
radiation, and radio waves. "Irradiation" refers to the 
application of radiation to a surface. 

7. Receptor: A molecule that has an affinity for a given 
ligand. Receptors may be naturally-occuring or manmade 
molecules. Also, they can be employed in their unaltered 
state or as aggregates with other species. Receptors may 
be attached, covalently or noncovalently, to a binding 
member, either directly or via a specific binding sub- 
stance. Examples of receptors which can be employed by 
this invention include, but are not restricted to, antibodies, 
cell membrane receptors, monoclonal antibodies and anti- 
sera reactive with specific antigenic determinants (such as 
on viruses, cells or other materials), drugs, 
polynucleotides, nucleic acids, peptides, cofactors, 
lectins, sugars, polysaccharides, cells, cellular 
membranes, and organelles. Receptors are sometimes 
referred to in the art as anti-ligands. As the term receptors 
is used herein, no difference in meaning is intended. A 
"Ligand Receptor Pair" is formed when two macromol- 
ecules have combined through molecular recognition to 
form a complex. 

Other examples of receptors which can be investigated by 
this invention include but are not restricted to: 

a) Microorganism receptors: Determination of ligands 
which bind to receptors, such as specific transport 
proteins or enzymes essential to survival of 
microorganisms, is useful in a new class of antibiotics. 
Of particular value would be antibiotics against oppor- 
tunistic fungi, protozoa, and those bacteria resistant to 
the antibiotics in current use. 

b) Enzymes: For instance, the binding site of enzymes 
such as the enzymes responsible for cleaving neu- 
rotransmitters; determination of ligands which bind to 
certain receptors to modulate the action of the enzymes 
which cleave the different neurotransmitters is useful in 
the development of drugs which can be used in the 
treatment of disorders of neurotransmission. 

c) Antibodies: For instance, the invention may be useful 
in investigating the ligand-binding site on the antibody 
molecule which combines with the epitope of an anti- 
gen of interest; determining a sequence that mimics an 
antigenic epitope may lead to the development of 




US 6,2 

7 

vaccines of which the immunogen is based on one or 
more of such sequences or lead to the development of 
related diagnostic agents or compounds useful in thera- 
peutic treatments such as for auto-immune diseases 
(e.g., by blocking the binding of the "self* antibodies). 

d) Nucleic Acids: Sequences of nucleic acids may be 
synthesized to establish DNA or RNA binding 
sequences. 

e) Catalytic Polypeptides: Polymers, preferably 
polypeptides, which are capable of promoting a chemi- 
cal reaction involving the conversion of one or more 
reactants to one or more products. Such polypeptides 
generally include a binding site specific for at least one 
reactant or reaction intermediate and an active func- 
tionality proximate to the binding site, which function- 
ality is capable of chemically modifying the bound 
reactant. Catalytic polypeptides are described in, for 
example, U.S. application Ser. No. 404,920, which is 
incorporated herein by reference for all purposes. 

f) Hormone receptors: For instance, the receptors for 
insulin and growth hormone. Determination of the 
ligands which bind with high affinity to a receptor is 
useful in the development of, for example, an oral 
replacement of the daily injections which diabetics 
must take to relieve the symptoms of diabetes, and in 
the other case, a replacement for the scarce human 
growth hormone which can only be obtained from 
cadavers or by recombinant DNA technology. Other 
examples are the vasoconstrictive hormone receptors; 
determination of those ligands which bind to a receptor 

. may lead to the development of drugs to control blood 
pressure. 

g) Opiate receptors: Determination of ligands which bind 
to the opiate receptors in the brain is useful in the 
development of less-addictive replacements for mor- 
phine and related drugs. 

8. Substrate: A material having a rigid or semi-rigid surface. 
In many embodiments, at least one surface of the substrate 
will be substantially flat, although in some embodiments 
it may be desirable to physically separate synthesis 
regions for different polymers with, for example, wells, 
raised regions, etched trenches, or the like. According to 
other embodiments, small beads may be provided on the 
surface which may be released upon completion of the 
synthesis. 

9. Protective Group: A material which is bound to a mono- 
mer unit and which may be spatially removed upon 
selective exposure to an activator such as electromagnetic 
radiation. Examples of protective groups with utility 
herein include Nitroveratryloxy carbonyl, Nitrobenzyloxy 
carbonyl, Dimethyl dimethoxybenzyloxy carbonyl, 
5-Bromo-7-nitroindolinyl, o-Hydroxy-a-methyl 
cinnamoyl, and 2-oxymethyIene anthraquinone. Other 
examples of activators include ion beams, electric fields, 
magnetic fields, electron beams, x-ray, and the like. 

10. Predefined Region: A predefined region is a localized 
area on a surface which is, was, or is intended to be 
activated for formation of a polymer. The predefined 
region may have any convenient shape, e.g., circular, 
rectangular, elliptical, wedge-shaped, etc. For the sake of 
brevity herein, "predefined regions" are sometimes 
referred to simply as "regions" 

11. Substantially Pure: A polymer is considered to be "sub- 
stantially pure" within a predefined region of a substrate 
when it exhibits characteristics that distinguish it from 
other predefined regions. Typically, purity will be mea- 
sured in terms of biological activity or function as a result 
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of uniform sequence. Such characteristics will typically 
be measured by way of binding with a selected ligand or 
receptor. 

II. General 

The present invention provides methods and apparatus for 
. the preparation and use of. a substrate having a plurality of 
polymer sequences in predefined regions. The invention is 
described herein primarily with regard to the preparation of 
molecules containing sequences of amino acids, but could 
10 readily be applied in the preparation of other polymers. Such 
polymers include, for example, both linear and cyclic poly- 
mers of nucleic acids, polysaccharides, phospholipids, and 
peptides having either a-, p-, or co-amino acids, hetero- 
polymers in which a known drug is covalently bound to any 
15 of the above, polyure thanes, polyesters, polycarbonates, 
polyureas, polyamides, polyethyleneimines, polyarylene 
sulfides, polysiloxanes, polyimides, polyacetates, or other 
polymers which will be apparent upon review of this dis- 
closure. In a preferred embodiment, the invention herein is 
20 used in the synthesis of peptides. 

The prepared substrate may, for example, be used in 
screening a variety of polymers as ligands for binding with 
a receptor, although it will be apparent that the invention 
could be used for the synthesis of a receptor for binding with 
25 a ligand. The substrate disclosed herein will have a wide 
variety of other uses. Merely by way of example, the 
invention herein can be used in determining peptide and 
nucleic acid sequences which bind to proteins, finding 
sequence-specific binding drugs, identifying epitopes rec- 
30 ognized by antibodies, and evaluation of a variety of drugs 
for clinical and diagnostic applications, as well as combi- 
nations of the above. 

The invention preferably provides for the use of a sub- 
strate "S" with a surface. Linker molecules "L" are option- 
35 ally provided on a surface of the substrate. The purpose of 
the linker molecules, in some embodiments, is to facilitate 
receptor recognition of the synthesized polymers. 
Optionally, the linker molecules may be chemically pro- 
^ tected for storage purposes. A chemical storage protective 
group such as t-BOC (t-butoxycarbonyl) may be used in 
some embodiments. Such chemical protective groups would 
be chemically removed upon exposure to, for example, 
acidic solution and would serve to protect the surface during 
45 storage and be removed prior to polymer preparation. 

On the substrate or a distal end of the linker molecules, a 
functional group with a protective group P 0 is provided. The 
protective group P 0 may be removed upon exposure to 
radiation, electric fields, electric currents, or other activators 
50 to expose the functional group. 

In a preferred embodiment, the radiation is ultraviolet 
(UV), infrared (IR), or visible light. As more fully described 
below, the protective group may alternatively be an 
electrochemically-sensitive group which may be removed in 
55 the presence of an electric field. In still further alternative 
embodiments, ion beams, electron beams, or the like may be 
used for deprotection. 

In some embodiments, the exposed regions and, therefore, 
the area upon which each distinct polymer sequence is 
60 synthesized are smaller than about 1 cm 2 or less than 1 mm 2 . 
In preferred embodiments the exposed area is less than about 
10,000 //m 2 or, more preferably, less than 100 /mi 2 and may, 
in some embodiments, encompass the binding site for as few 
as a single molecule. Within these regions, each polymer is 
65 preferably synthesized in a substantially pure form. 

Concurrently or after exposure of a known region of the 
substrate to light, the surface is contacted with a first 
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monomer unit M A which reacts with the functional group 
which has been exposed by the deprotection step. The first 
monomer includes a protective group P v P 2 may or may not 
be the same as P 0 . 

Accordingly, after a first cycle, known first regions of the 
surface may comprise the sequence: 



S-L-M^P! 

while remaining 
sequence: 

S-L-Po. 



regions of the surface comprise the 



Thereafter, second regions of the surface (which may 
include the first region) arc exposed to light and contacted 
with a second monomer M 2 (which may or may not be the 
same as having a protective group P 2 . P 2 may or may 
not be the same as P 0 and ? v After this second cycle, 
different regions of the substrate may comprise one or more 
of the following sequences: 

S-L-Mi-Mj-Pa 

S-L-M 2 -P 2 

S-L-M^Pi and/or 

S-L-Pq. 



The above process is repeated until the substrate includes 
desired polymers of desired lengths. By controlling the 
locations of the substrate exposed to light and the reagents 
exposed to the substrate following exposure, the location of 
each sequence will be known. 

Thereafter, the protective groups are removed from some 
or all of the substrate and the sequences are, optionally, 
capped with a capping unit C. The process results in a 
substrate having a surface with a plurality of polymers of the 
following general formula: 

s-[l>CM>MMMi) • • • <M>{C] 

where square brackets indicate optional groups, and M, . . . 

indicates any sequence of monomers. The number of 
monomers could cover a wide variety of values, but in a 
preferred embodiment they will range from 2 to 100. 

In some embodiments a plurality of locations on the 
substrate polymers are to contain a common monomer 
subsequence. For example, it may be desired to synthesize 
a sequence S-Mj-Ms-M, at first locations and a sequence 
S-M 4 -M2-M 3 at second locations. The process would com- 
mence with irradiation of the first locations followed by 
contacting with M^P, resulting in the sequence SrM^P at 
the first location. The second locations would then be 
irradiated and contacted with M 4 -P, resulting in the sequence 
S-M 4 -P at the second locations. Thereafter both the first and 
second locations would be irradiated and contacted with the 
dimer M 2 -M 3 , resulting in the sequence S-Mj-M^N^ at the 
first locations and S-M 4 -M 2 -M 3 at the second locations. Of 
course, common subsequences of any length could be uti- 
lized including those in a range of 2 or more monomers, 2 
to 100 monomers, 2 to 20 monomers, and a most preferred 
range of 2 to 3 monomers. 

According to other embodiments, a set of masks is used 
for the first monomer layer and, thereafter, varied light 
wavelengths are used for selective deprotection. For 
example, in the process discussed above, first regions are 
first exposed through a mask and reacted with a first mono- 
mer having a first protective group P 1( which is removable 



upon exposure to a first wavelength of light (e.g., IR). 
Second regions are masked and reacted with a second 
monomer having a second protecivc group P 2 , which is 
removable upon exposure to a second wavelength of light 
5 (e.g., UV). Thereafter, masks become unnecessary in the 
synthesis because the entire substrate may be exposed 
alternatively to the first, and second wavelengths of light in 
the deprotection cycle. 
The polymers prepared on a substrate according to the 
10 above methods will have a variety of uses including, for 
example, screening for biological activity. In such screening 
activities, the substrate containing the sequences is exposed 
to an unlabeled or labeled receptor such as an antibody, 
receptor on a cell, phospholipid vesicle, or any one of a 
15 variety of other receptors. In one preferred embodiment the 
polymers are exposed to a first, unlabeled receptor of interest 
and, thereafter, exposed to a labeled receptor-specific rec- 
ognition element, which is, for example, an antibody. This 
process will provide signal amplification in the detection 
20 stage. 

The receptor molecules may bind with one or more 
polymers on the substrate. The presence of the labeled 
receptor and, therefore, the presence of a sequence which 
binds with the receptor is detected in a preferred embodi- 
25 ment through the use of autoradiography, detection of fluo- 
rescence with a charge-coupled device, fluorescence 
microscopy, or the like. The sequence of the polymer at the 
locations where the receptor binding is detected may be used 
to determine all or part of a sequence which is complemen- 
30 tary to the receptor. 

Use of the invention herein is illustrated primarily, with 
reference to screening for biological activity. The invention 
will, however, find many other uses. For example, the 
invention may be used in information storage (e.g., on 
35 optical disks), production of molecular electronic devices, 
production of stationary phases in separation sciences, pro- 
duction of dyes and brightening agents, photography, and in 
immobilization of cells, proteins, lectins, nucleic acids, 
polysaccharides and the like in patterns on a surface via 
40 molecular recognition of specific polymer sequences. By 
synthesizing the same compound in adjacent, progressively 
differing concentrations, a gradient will be established to 
control chemotaxis or to develop diagnostic dipsticks which, 
for example, titrate an antibody against an increasing 
45 amount of antigen. By synthesizing several catalyst mol- 
ecules in close proximity, more efficient multistep conver- 
sions may be achieved by "coordinate immobilization." 
Coordinate immobilization also may be used for electron 
transfer systems, as well as to provide both structural 
50 integrity and other desirable properties to materials such as 
lubrication, wetting, etc. 

According to alternative embodiments, molecular biodis- 
tribution or pharmacokinetic properties may be examined. 
For example, to assess resistance to intestinal or serum 
55 proteases, polymers may be capped with a fluorescent tag 
and exposed to biological fluids of interest. 

III. Polymer Synthesis 

FIG. 1 illustrates one embodiment of the invention dis- 
60 closed herein in which a substrate 2 is shown in cross- 
section. Essentially, any conceivable substrate may be 
employed in the invention. The substrate may be biological, 
nonbiological, organic, inorganic, or a combination of any of 
these, existing as particles, strands, precipitates, gels, sheets, 
65 tubing, spheres, containers, capillaries, pads, slices, films, 
plates, slides, etc. The substrate may have any convenient 
shape, such as a disc, square, sphere, circle, etc. The 
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substrate is preferably flat but may take on a variety of bonds (using, for example, glass or silicon oxide surfaces), 

alternative surface configurations. For example, the sub- Siloxane bonds with the surface of the substrate may be 

strate may contain raised or depressed regions on which the formed in one embodiment via reactions of linker molecules 

synthesis takes place. The substrate and its surface prefer- bearing trichlorosilyl groups. The linker molecules may 

ably form a rigid support on which to carry out the reactions 5 optionally be attached in an ordered array, i.e., as parts of the 

described herein. The substrate and its surface is also chosen ° ead S rou P s in a polymerized Langmuir Blodgctt film. In 

to provide appropriate light-absorbing characteristics. For alternative embodiments, the linker molecules are adsorbed 

instance, the substrate may be a polymerized Langmuir lo the surface of the substrate. 

Blodgett film, funcUonalized glass, Si, Ge, GaAs, GaP, Si0 2 , The hnker molecules and monomers used herein are 

SiN 4 , modified silicon, or any one of a wide variety of gels 10 P rovi£ ¥ Wlth a ^nctiomal group to which is bound a 

or polymers such as (poly)tetrafluoroetnylene, (poly) group. Preferably, the protective group is on he 

• i*j j-fl • i , 4 " * _u* distal or terminal end of the linker molecule opposite the 

vmyhdenedifluoride polystyrene, polycarbona e, or combi- M ^ ^ ^ be cithef ^ negative 

nations thereof. Other substrate materials will be readily lective group (ix me p rote ctive group renders the linker 

apparent to those of skill in is the art upon review of this moIecules less reactivc ^ a mon omer upon exposure) or 

disclosure. In a preferred embodiment the substrate is flat is a positive protective group (i.e., the protective group renders 

glass or single-crystal silicon with surface relief features of (he Hnker molecules more reactive with a monomer upon 

less than 10 A. exposure). In the case of negative protective groups an 

According to some embodiments, the surface of the additional step of reactivation will be required. In some 

substrate is etched using well known techniques to provide embodiments, this will be done by heating, 

for desired surface features. For example, by way of the 20 The protective group on the linker molecules may be 

formation of trenches, v-grooves, mesa structures, or the selected from a wide variety of positive light-reactive groups 

like, the synthesis regions may be more closely placed preferably including nitro aromatic compounds such as 

within the focus point of impinging light, be provided with o-nitrobenzyl derivatives or benzylsulfonyl. In a preferred 

reflective "mirror" structures for maximization of light col- embodiment, 6-nitroveratryloxy-carbonyl (NVOC), 

lection from fluorescent sources, or the like. 25 2-nitrobenzyloxycarbonyl (NBOC) or a,a-dimethyl- 

Surfaces on the solid substrate will usually, though not dimethoxybenzyloxycarbonyl (DDZ) is used. In one 

always, be composed of the same material as the substrate. embodiment, a nitro aromatic compound containing a ben- 

Thus, the surface may be composed of any of a wide variety zylic hydrogen ortho to the nitro group is used, i.e., a 

of materials, for example, polymers, plastics, resins, chemical of the form: 
polysaccharides, silica or silica-based materials, carbon, 
metals, inorganic glasses, membranes, or any of the above- 
listed substrate materials. In some embodiments the surface 
may provide for the use of caged binding members which 
are attached firmly to the surface of the substrate in accord 
with the teaching of copending application Ser. No. 404,920, 
previously incorporated herein by reference. Preferably, the 
surface will contain reactive groups, which could be 
carboxyl, amino, hydroxyl, or the like. Most preferably, the 
surface will be optically transparent and will have surface 
Si — OH functionalities, such as are found on silica surfaces. 

The surface 4 of the substrate is preferably provided with 40 where Rj is alkoxy, alkyl, halo, aryl, alkenyl, or hydrogen; 
a layer of linker molecules 6, although it will be understood R 2 & alkoxy, alkyl, halo, aryl, nitro, or hydrogen; R 3 is 
that the linker molecules are not required elements of the alkoxy, alkyl, halo, nitro, aryl, or hydrogen; R 4 is alkoxy, 
invention. The linker molecules are preferably of suffi- alkyl, hydrogen, aryl, halo, or nitro; and R 5 is alkyl, alkynyl, 
cientlength to permit polymers in a completed substrate to cyano, alkoxy, hydrogen, halo, aryl, or alkenyl. Other mate- 
interact freely with molecules exposed to the substrate. The 45 rials which may be used include o-hydroxy-ct-methyl cin- 
linker molecules should be 6-50 atoms long to provide namoyi derivatives. Photoremovable protective groups are 
sufficient exposure. The linker molecules may be, for described in, for example, Patchornik, /. Am. Chem. Soc. 
example, aryl acetylene, ethylene glycol oligomers contain- (1970) 92:6333 and Amit et al., /. Org. Chem. (1974) 
ing 2-10 monomer units, diamines, diacids, amino acids, or 39:192, both of which are incorporated herein by reference, 
combinations thereof. Other linker molecules may be used 5Q In ^ alternative embodiment the positive reactive group 
in light of this disclsbure. is activated for reaction with reagents in solution.. For 
According to alternative embodiments, the linker mol- example, a 5-bromo-7-nitro indoline group, when bound to 
ecules are selected based upon their hydrophilic/ a carbohyl, undergoes reaction upon exposure to light at 420 
hydrophobic properties to improve presentation of synthe- nm# 

sized polymers to certain receptors. For example, in the case ^ i n a second alternative embodiment, the reactive group on 

of a hydrophilic receptor, hydrophilic linker molecules will me Hoker molecule is selected from a wide variety of 

be preferred so as to permit the receptor to more closely negative light-reactive groups including a cinammate group, 

approach the synthesized polymer. Alternatively, the reactive group is activated or deacti- 

According to another alternative embodiment, linker mol- vated by electron beam lithography, x-ray lithography, or 

ecules are also provided with a photocleavable group at an any other radiation. Suitable reactive groups for electron 

intermediate position. The photocleavable group is prefer- 60 beam lithography include sulfonyl. Other methods may be 

ably cleavable at a wavelength different from the protective used including, for example, exposure to a current source, 

group. This enables removal of the various polymers fol- Other reactive groups and methods of activation may be 

lowing completion of the synthesis by way of exposure to used in light of this disclosure. 

the different wavelengths of light. As shown in FIG. 1, the linking molecules are preferably 

The linker molecules can be attached to the substrate via 65 exposed to, for example, light through a suitable mask 8 

carbon-carbon bonds using, for example, (poly) using photolithographic techniques of the type known in the 

trifluorochloroethylene surfaces, or preferably, by siloxane semiconductor industry and described in, for example, Sze, 
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VLSI Technology, McGraw-Hill (1983), and Mead et al., 
Introduction to VLSI Systems, Addison-Wesley (1980), 
which are incorporated herein by reference for all purposes. 
The light may be directed at either the surface containing the 
protective groups or at the back of the substrate, so long as 
. the substrate is transparent to the wavelength of light needed 
for removal, of the protective groups. In. the embodiment 
shown in FIG. 1, light is directed at the surface of the 
substrate containing the protective groups. FIG. 1 illustrates 
the use of such masking techniques as they are applied to a 
positive reactive group so as to activate linking molecules 
and expose functional groups in areas 10a and 106. 

The mask 8 is in one embodiment a transparent support 
material selectively coated with a layer of opaque material. 
Portions of the opaque material are removed, leaving opaque 
material in the precise pattern desired on the substrate 
surface. The mask is brought into close proximity with, 
imaged on, or brought directly into contact with the substrate 
surface as shown in FIG. 1. "Openings" in the mask corre- 
spond to locations on the substrate where it is desired to 
remove photoremovable protective groups from the sub- 
strate. Alignment may be performed using conventional 
alignment techniques in which alignment marks (not shown) 
are used to accurately overlay successive masks with pre- 
vious patterning steps, or more sophisticated techniques may 
be used. For example, interferometric techniques such as the 
one described in Flanders et al., "A New Interferometric 
Alignment Technique," App. Phys. Lett. (1977) 31:426-428, 
which is incorporated herein by reference, may be used. 

To enhance contrast of light applied to the substrate, it is 
desirable to provide contrast enhancement materials 
between the mask and the substrate according to some 
embodiments. This contrast enhancement layer may com- 
prise a molecule which is decomposed by light such as 
quinone diazid or a material which is transiently bleached at 
the wavelength of interest. Transient bleaching of materials 
will allow greater penetration where light is applied, thereby 
enhancing contrast. Alternatively, contrast enhancement 
may be provided by way of a cladded fiber optic bundle. 

The light may be from a conventional incandescent 
source, a laser, a laser diode, or the like. If non-collimated 
sources of light are used it may be desirable to provide a 
thick- or multi-layered mask to prevent spreading of the 
light onto the substrate. It may, further, be desirable in some 
embodiments to utilize groups which are sensitive to differ- 
ent wavelengths to control synthesis. For example, by using 
groups which are sensitive to different wavelengths, it is 
possible to select branch positions in the synthesis of a 
polymer or eliminate certain masking steps. Several reactive 
groups along with their corresponding wavelengths for 
deprotection are provided in Table 1. 



TABLE 1 





Approximate 




Group 


Deprotection Wavelength 


Nitroveratryloxy carbon yl (NVOQ 


UV (300-400 


nm) 


Nitrobenzyloxy carbonyl (NBOC) 


UV (300-350 


nm) 


Dimethyl dimethoxybenzyloxy cartway] 


UV (280-300 


cm) 


5-Bromo-7-nitroindoIinyl 


UV (420 nm) 




o-Hydroxy-a-methyl cinnamoyl 


UV (300-350 


nm) 


2-OxymethyIene anthraquinone 


UV (350 nm) 





While the invention is illustrated primarily herein by way 
of the use of a mask to illuminate selected regions the 
substrate, other techniques may also be used. For example, 
the substrate may be translated under a modulated laser or 
diode light source. Such techniques are discussed in, for 
example, U.S. Pat. No. 4,719,615 (Feyrer et al.), which is 
incorporated herein by reference. In alternative embodi- 
ments a laser galvanometric scanner is utilized. In other 
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embodiments, the synthesis may take place on or in contact 
with a conventional liquid crystal (referred to herein as a 
"light valve") or fiber optic light sources. By appropriately 
modulating liquid crystals, light may be selectively con- 
5 trolled so as to permit light to contact selected regions of the 
substrate. Alternatively, synthesis may take place on the end 
of a series of optical fibers to which light is selectively 
applied. Other means of controlling the location of light 
exposure will be apparent to those of skill in the art. 
The substrate may be irradiated either in contact or not in 
10 contact with a solution (not shown) and is, preferably, 
irradiated in contact with a solution. The solution contains 
reagents to prevent the by-products formed by irradiation 
from interfering with synthesis of the polymer according to 
some embodiments. Such by-products might include, for 
15 example, carbon dioxide, nitrosocarbonyl compounds, sty- 
re ne derivatives, indole derivatives, and products of their 
photochemical reactions. Alternatively, the solution may 
contain reagents used to match the index of refraction of the 
substrate. Reagents added to the solution may further 
include, for example, acidic or basic buffers, thiols, substi- 
20 tuted hydrazines and hydroxylamines, reducing agents (e.g., 
NADH) or reagents known to react with a given functional 
group (e.g., aryl nitroso+glyoxylic acid-*aryl 
formhydroxamate+CO^. 
Either concurrently with or after the irradiation step, the 
25 linker molecules are washed or otherwise contacted with a 
first monomer, illustrated by "A" in regions 12a and 12b in 
FIG. 2. The first monomer reacts with the activated func- 
tional groups of the linkage molecules which have been 
exposed to light. The first monomer, which is preferably an 
30 amino acid, is also provided with a photoprotective group. 
The photoprotective group on the monomer may be the same 
as or different than the protective group used in the linkage 
molecules, and may be selected from any of the above- 
described protective groups. In one embodiment, the pro- 
35 tective groups for the A monomer is selected from the group 
NBOC and NVOC. 

As shown in FIG. 3, the process of irradiating is thereafter 
repeated, with a mask repositioned so as to remove linkage 
protective groups and expose functional groups in regions 
14a and 14b which are illustrated as being regions which 
40 were protected in the previous masking step. As an alterna- 
tive to repositioning of the first mask, in many embodiments 
a second mask will be utilized. In other alternative 
embodiments, some steps may provide for illuminating a 
common region in successive steps. As shown in FIG. 3, it 
45 may be desirable to provide separation between irradiated 
regions. For example, separation of about 1-5 fim may be 
appropriate to account for alignment tolerances. 

As shown in FIG. 4, the substrate is then exposed to a 
second protected monomer "B," producing B regions 16a 
50 and 16b. Thereafter, the substrate is again masked so as to 
remove the protective groups and expose reactive groups on 
A region 12a and B region 16£>. The substrate is again 
exposed to monomer B, resulting in the formation of the 
structure shown in FIG. 6. The dimers B-A and B-B have 
55 been produced on the substrate. 

A subsequent series of masking and contacting steps 
similar to those described above with A (not shown) pro- 
vides the structure shown in FIG. 7. The process provides all 
possible dimers of B and A, i.e., B-A, A-B, A- A, and B-B. 
The substrate, the area of synthesis, and the area for 
60 synthesis of each individual polymer could be of any size or 
shape. For example, squares, ellipsoids, rectangles, 
triangles, circles, or portions thereof, along with irregular 
geometric shapes, may be utilized. Duplicate synthesis areas 
may also be applied to a single substrate for purposes of 
65 redundancy. 

In one embodiment the regions 12 and 16 on the substrate 
will have a surface area of between about 1 cm 2 and 10" 10 
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cm 2 . In some embodiments the regions 12 and 16 have areas 
of less than about 10 -1 cm 2 , 10 -2 cm 2 , 10" 3 cm 2 , 10~ 4 cm 2 , 
lO" 5 cm 2 , 1(T 6 cm 2 , 10" 7 cm 2 , 10" 8 cm 2 , or 10" 10 cm 2 . In 
a preferred embodiment, the regions 12 and 16 are between 
about 10x10 /an and 500x500 /mi. 

In some embodiments a single substrate supports more 
than about 10 different monomer sequences and perferably 
more than about 100 different monomer sequences, although 
in some embodiments more than about 10 3 , 10 4 , 10 5 , 10 6 , 
10 7 , or 10 8 different sequences are provided on a substrate. 
Of course, within a region of the substrate in which a 
monomer sequence is synthesized, it is preferred that the 
monomer sequence be substantially pure. In some 
embodiments, regions of the substrate contain polymer 
sequences which are at least about 1%, 5%, 10%, 15%, 20%, 
25%, 30%, 35%, 40%, 45%, 50%, 60%, 70%, 80%, 90%, 
95%, 96%, 97%, 98%, or 99% pure. 

According to some embodiments, several sequences are 
intentionally provided within a single region so as to provide 
an initial screening for biological activity, after which mate- 
rials within regions exhibiting significant binding are further 
evaluated. 

IV. Details of One Embodiment of a Reactor 
System 

FIG. 8A schematically illustrates a preferred embodiment 
of a reactor system 100 for synthesizing polymers on the 
prepared substrate in accordance with one aspect of the 
invention. The reactor system includes a body 102 with a 
cavity 104 on a surface thereof. In preferred embodiments 
the cavity 104 is between about 50 and 1000 //m deep with 
a depth of about 500 /*m preferred. 

The bottom of the cavity is preferably provided with an 
array of ridges 106 which extend both into the plane of the 
Figure and parallel to the plane of the Figure. The ridges are 
preferably about 50 to 200 //m deep and spaced at about 2 35 
to 3mm. The purpose of the ridges is to generate turbulent 
flow for better mixing. The bottom surface of the cavity is 
preferably light absorbing so as to prevent reflection of 
impinging light. 

A substrate 112 is mounted above the cavity 104. The 40 
substrate is provided along its bottom surface 114 with a 
photoremovable protective group such as NVOC with or 
without an intervening linker molecule. The substrate is 
preferably transparent to a wide spectrum of light, but in 
some embodiments is transparent only at a wavelength at 45 
which the protective group may be removed (such as UV in 
the case of NVOC). The substrate in some embodiments is 
a conventional microscope glass slide or cover slip. The 
substrate is preferably as thin as possible, while still pro- 
viding adequate physical support. Preferably, the substrate is 5Q 
less than about 1 mm thick, more preferably less than 0.5 
mm thick, more preferably less than 0.1 mm thick, and most , 
preferably less than 0.05 mm thick. In alternative preferred 
embodiments, the substrate is quartz or silicon. 

The substrate and the body serve to seal the cavity except 
for an inlet port 108 and an outlet port 110. The body and the 
substrate may be mated for sealing in some embodiments 
with one or more gaskets. According to a preferred 
embodiment, the body is provided with two concentric 
gaskets and the intervening space is held at vacuum to 
ensure mating of the substrate to the gaskets. 60 

Fluid is pumped through the inlet port into the cavity by 
way of a pump 116 which may be, for example, a model no. 
B-120-S made by Eldex Laboratories. Selected fluids are 
circulated into the cavity by the pump, through the cavity, 
and out the outlet for recirculation or disposal. The reactor 65 
may be subjected to ultrasonic radiation and/or heated to aid 
in agitation in some embodiments. 
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Above the substrate 112, a lens 120 is provided which 
may be, for example, a 2" 100 mm focal length fused silica 
lens. For the sake of a compact system, a reflective mirror 
122 may be provided for directing light from a light source 
5 124 onto the substrate. Light source 124 may be, for 
example, a Xe(Hg) light source manufactured by Oriel and 
having model no. 66024. Asecond lens 126 may be provided 
for the purpose of projecting a mask image onto the substrate 
in combination with lens 112. This form of lithography is 
referred to herein as projection printing. As will be apparent 
10 from this disclosure, proximity printing and the like may 
also be used according to some embodiments. 

Light from the light source is permitted to reach only 
selected locations on the substrate as a result of mask 128. 
Mask 128 may be, for example, a glass slide having etched 
15 chrome thereon. The mask 128 in one embodiment is 
provided with a grid of transparent locations and opaque 
locations. Such masks may be manufactured by, for 
example, Photo Sciences, Inc. Light passes freely through 
the transparent regions of the mask, but is reflected from or 
20 absorbed by other regions. Therefore, only selected regions 
of the substrate are exposed to light. 

As discussed above, light valves (LCD's) may be used as 
an alternative to conventional masks to selectively expose 
regions of the substrate. Alternatively, fiber optic faceplates 
such as those available from Schott Glass, Inc, may be used 
for the purpose of contrast enhancement of the mask or as 
the sole means of restricting the region to which light is 
applied. Such faceplates would be placed directly above or 
on the substrate in the reactor shown in FIG. 8 A. In still 
further embodiments, flys-eye lenses, tapered fiber optic 
faceplates, or the like, may be used for contrast enhance- 
ment. . 

In order to provide for illumination of regions smaller 
than a wavelength of light, more elaborate techniques may 
be utilized. For example, according to one preferred 
embodiment, light is directed at the substrate by way of 
molecular microcrystals on the tip of, for example, micropi- 
pettes. Such devices are disclosed in Lieberman et al., "A 
Light Source Smaller Than the Optical Wavelength," Sci- 
ence (1990) 247:59-61, which is incorporated herein by 
reference for all purposes. 

In operation, the substrate is placed on the cavity and 
sealed thereto. All operations in the process of preparing the 
substrate are carried out in a room lit primarily or entirely by 
light of a wavelength outside of the light range at which the 
protective group is removed. For example, in the case of 
NVOC, the room should be lit with a conventional dark 
room light which provides little or no UV light. All opera- 
tions are preferably conducted at about room temperature. 

A first, deprotection fluid (without a monomer) is circu- 
lated through the cavity. The solution preferably is of 5-mM 
sulfuric acid in dioxane solution which serves to keep 
exposed amino groups protonated and decreases their reac- 
tivity with photolysis by-products. Absorptive materials 
such as N,N-diethylamino 2,4-dinitrobenzene, for example, 
may be included in the deprotection fluid which serves to 
absorb light and prevent reflection and unwanted photolysis. 

The slide is, thereafter, positioned in a light raypath from 
the mask such that first locations on the substrate are 
illuminated and, therefore, deprotected. In preferred 
embodiments the substrate is illuminated for between about 
1 and 15 minutes with a preferred illumination time of about 
10 minutes at 10-20 mW/cm 2 with 365 nm light. The slides 
are neutralized (Le., brought to a pH of about 7) after 
photolysis with, for example, a solution of 
di-isopropylethylamine (DIEA) in methylene chloride for 
about 5 minutes. 

The first monomer is then placed at the first locations on 
the substrate. After irradiation, the slide is removed, treated 
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in bulk, and then reinstalled in the flow cell. Alternatively, a solution of about 1% BSA (bovine serum albumin), 0.5% 

a fluid containing the first monomer, preferably also pro- Tween in PBS (phosphate buffered saline) buffer. The anti- 

tected by a protective group, is circulated through the cavity bodies are diluted into the supercocktail buffer to a final 

by way of pump 116. If, for example, it is desired to attach concentration of, for example, about 0.1 to 4 jig/ml 
the ammo acid Y to the substrate at the first locations, the 5 nG gB an aUeraative ferred embodiraent of 

ammo acid Y (bearing a protective group on its a-nitrogen), . ■ . . . 0 K A • .... 

along with reagents used to reader the monomer reactive, ,he «? ct0 ' ,f OW Vi™* i 8 A- According to this 

,. & ■ • * t * jr * ~ * • no embodiment, the -mask 128 is placed directly in contact with 

and/or a earner, is circulated from a storage container 118, . , 4 t V„ f U1 .. . . j " . ■ e -* , ■. 

through the pump, through the cavity, and back to the inlet ^ substrate Preferably, the etched portion of the mask is 

of the urn placed face down so as to reduce the effects of light 

o e pump. ^ dispersioa. According to this embodiment, the imaging 

The monomer carrier solution is, in a preferred , enses no and u6 are no( nece because the mask is 

embodiment, formed by mixing of a first solution (referred b[OU ^ t ^ dose riinil ^ ±e substrate 
to herein as solution A ^ and a second solution (referred to r , , 

herein as solution "B"). Table 2 provides an illustration of a For purposes of increasing the signal-to-noise ratio of the 

mixture which may be used for solution A. technique, some embodiments of the invention provide for 

15 exposure of the substrate to a first labeled or unlabeled 

TABLE 2 receptor followed by exposure of a labeled, second receptor 

(e.g., an antibody) which binds at multiple sites on the first 

Representative Monomer Carrier Solution "A" receptor. If, for example, the first receptor is an antibody 

derived from a first species of an animal, the second receptor 



100 mg Nvoc amino protected amino acid jg an ant j D ody derived from a second species directed to 

"S^&ESZZS* epitopes associated with the firs, species. In the case of a 

86 >d diea piisopropyiethylamine) mouse antibody, for example, fluorescently labeled goat 
^— — — — — antibody or antiserum which is antimouse may be used to 

bind at multiple sites on the mouse antibody, providing 

The composition of solution B is illustrated in Table 3. several times the fluorescence compared to the attachment of 

Solutions A and B are mixed and allowed to react at room 25 a s ; n gi e mouse antibody at each binding site. This process 

temperature for about 8 minutes, then diluted with 2 ml of mav be repeated again with additional antibodies (e.g., 

DMF, and 500 fA are applied to the surface of the slide or the goat-mouse-goat, etc.) for further signal amplification, 

solution is circulated through the reactor system and allowed In ^ embodiments an ordered nce of masks 

to react for about 2 hours at room temperature. Tte slide is fa ^ fa ^ embodinienls it fa £ le t0 use as few 

then washed with DMF, methylene chlonde and ethanol. 30 ^ a single mask tQ synthesizc dl of me possible polymers 

of a given monomer set. 
TABLE 3 . . . . ■ . 
■ ■ ■ — ■ : — — If, for example, it is desired to synthesize all 16 dinucle- 

Representative Monomer Carrier Solution "B" otides from four bases, a 1 cm square synthesis region is 

■ 35 divided conceptually into 16 boxes, each 0.25 cm wide. 

in orxD /o »^ 5 ° f. DXF , . ... . . . Denote the four monomer units by A, B, C, and D. The first 

111 mg BOP (Benzotriazolyl-n-oxy-tnsfduncthylamino) . . . . J "7 . ! , A „ 

phosphoniumherarluorophosphate) reactions are earned out in four vertical columns, each 0.25 

— cm wide. The first mask exposes the left-most column of 

boxes, where A is coupled. The second mask exposes the 

As the solution containing the monomer to be attached is next column, where B is coupled; followed by a third mask, 

circulated through the cavity, the amino acid or other mono- 40 f or me q column; and a final mask that exposes the right- 

mer will react at its carboxy terminus with amino groups on most column, for D. The first, second, third, and fourth 

the regions of the substrate which have been deprotected. Of masks may be a single mask translated to different locations, 

course while the invention is ^ illustrated by way of circula- ^ ^ fa ed ^ ^ horizontal directioQ for the 

Uon of the monomer through the cavity the mvenuon could of ^f dimeu ^ dmej the masks allow 

be practiced by way of removing tie shde from the reactor 45 fe of norizoQtal rows , again 0 .25 cm wide. A, B, C, 

and submersing it in an appropriate monomer solution. J D afe . ntiallv co le * d ^ masks mat cxpose 

After addition of the first monomer the solution contain- horizontal fourths of t he rcaclion area . ^ resulting sub- 

mg the first ammo acid is then purged from the system. After strate contairjs ^ 16 dinucleotides of four bases, 
circulation of a sufficient amount of the DMF/methylene 

chloride such that removal of the amino acid can be assured 50 The eight masks used to synthesize the dinucleotirfp are 

(e.g., about 50x times the volume of the cavity and carrier elated to one another by translation or rotation. In fact, one 

lines), the mask or substrate is repositioned, or a new mask mask can be used in all eight steps if it is suitably rotated and 

is utilized such that second regions on the substrate will be translated. For example, in the example above, a mask with 

exposed to light and the light 124 is eogaged for a second * single transparent region could be sequentially used to 

exposure. This will deprotect second regions on the substrate expose each of the vertical columns, translated 90°, and then 

and the process is repeated until the desired polymer 55 sequentially used to allow exposure of the horizontal rows, 

sequences have been synthesized. Tables 4 and 5 provide a simple computer program in 

The entire derivatized substrate is then exposed to a Quick Basic for planning a masking program and a sample 

receptor of interest, preferably labeled with, for example, a output, respectively, for the synthesis of a polymer chain of 

fluorescent marker, by circulation of a solution or suspen- three monomers ("residues") having three different mono- 

sion of the receptor through the cavity or by contacting the 60 mers in the first level, four different monomers in the second 

surface of the slide in bulk. The receptor will preferentially level, and five different monomers in the third level in a 

bind to certain regions of the substrate which contain striped pattern. The output of the program is the number of 

complementary sequences. cells, the number of "stripes" (light regions) on each mask, 

Antibodies are typically suspended in what is commonly and the amount of translation required for each exposure of 

referred to as "supercocktail," which may be, for example, the mask. 
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TABLE 4 

Mask Strategy Program 

DEFINTA-Z . 

DIM b(20), w(20), 1(500) 

OPEN £S FOR OUTPUT AS #1 . " 

jmax * 3 'Number of residues 

b(l) - 3: b(2) - 4: b(3) - 5 'Number of bunding blocks for res 1, 2, 3 

g-l:lmax(l)-l 

FOR ] - 1 TO jmax: g- g ■ b(j): NEXT j 
w(0) - 0: w(l) - g/b(l) 

PRINT #1, "MASK2.BAS**, DATES, TIMES: PRINT #1, 
PRINT #1, USING "Number of residues-**"; jmax 
FOR j - 1 TO jmax 

PRINT #1, USING - Residue ## m building blocks"; j; b(J) 

NEXTj 

PRINT #1, ~ 

PRINT #1, USING "Number of cclls-###*r ; g: PRINT #1, 

FOR j - 2 TO jmax 

lmaxQ) - lmaxfj - 1) • bQ - 1) 

w(j)-wG-l)/b0 

NEXTj 

FOR j - 1 TO jmax 

PRINT #1, USING "Mask for residue ##"; j: PRINT #1, 
PRINT #1, USING " Number of stripcs-###"; 1 max(j) 
PRINT #1, USING " Width of each stripc-###"; w(j) 
FOR 1 - 1 TO lmax(j) 
a - 1 + (1 - 1) * w(j - 1) 
ae - a + w(j) - 1 

PRINT #1, USING " Stripe ## begins at location ### and ends at ##*T; 1; a; ae 
NEXT 1 
PRINT #1, 

PRINT #1, USING - For each of ## building blocks, translate mask by ## 

ceU(s)"; bO); wQ, 

PRINT #1, : PRINT #1, : PRINT #1, 

NEXTj 

® Copyright 1990, Affymax N. V 
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TABLE 5 



Masking Strategy Output 
Number of residues- 3 

Residue 1 

Residue 2 

Residue 3 
Number of cells- 60 
Mask for residue 1 

Number of stripes- 1 
Width of each stripe- 20 
Stripe 1 begins at location i and ends at 20 
For each of 3 building blocks, translate mask by 20 cell(s) 
Mask for residue 2 

Number of stripes- 3 
Width of each stripe- 5 
Stripe 1 begins at location 1 and ends at 5 
Stripe 2 begins at location 21 and ends at 25 
Stripe 3 begins at location 41 and ends at 45 
For each of 4 building blocks, translate mask by 5 ce 11(a) 
Mask for residue 3 

Number of stripes- 12 

Width of each stripe- 1 

Stripe 1 begins at location 1 and ends at 1 

Stripe 2 begins at location 6 and ends at 6 

Stripe 3 begins at location 11 and ends at 11 

Stripe 4 begins at location 16 and ends at 16 



3 builduig blocks 

4 building blocks 

5 building blocks 
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TABLE 5 -continued 



Masking Strategy Output 



Stripe 5 begins at location 21 and ends at 21 

Stripe 6 begins at location 26 and ends at 26 

Stripe 7 begins at location 31 and ends at 31 

Stripe 8 begins at location 36 and ends at 36 

Stripe 9 begins at location 41 and ends at 41 

Stripe 10 begins at location 46 and ends at 46 

Stripe 11 begins at location 51 and ends at 51 

Stripe 12 begins at location 56 and ends at 56 

For each of 5 building blocks, translate mask by 1 cell(s) 



® Copyright 1990, Affymax N.V 



V. Details of One Embodiment of A Fluorescent 
Detection Device 

FIG. 9 illustrates a fluorescent detection device for detect- 
ing fluorescently labeled receptors on a substrate. A sub- 
strate 112 is placed on an x/y translation table 202. In a 
preferred embodiment the x/y translation table is a model no. 
PM500-A1 manufactured by Newport Corporation. The x/y 
translation table is connected to and controlled by an appro- 
priately programmed digital computer 204 which may be, 
for example, an appropriately programmed IBM PC/AT or 
AT compatible computer. Of course, other computer 
systems, special purpose hardware, or the like could readily 
be substituted for the AT computer used herein for illustra- 
tion. Computer software for the translation and data collec- 
tion functions described herein can be provided based on 
commercially available software including, for example, 
"Lab Windows" licensed by National Instruments, which is 
incorporated herein by reference for all purposes. 

The substrate and x/y translation table are placed under a 
microscope 206 which includes one or more objectives 208. 
Light (about 488 nm) from a laser 210, which in some 
embodiments is a model no. 2020-05 argon ion laser manu- 
factured by Spectraphysics, is directed at the substrate by a 
dichroic mirror 207 which passes greater than about 520 nm 
light but reflects 488 nm light. Dichroic mirror 207 may be, 
for example, a model no. FT510 manufactured by Carl 
Zeiss. Light reflected from the mirror then enters the micro- 
scope 206 which may be, for example, a model no. Axioscop 
20 manufactured by Carl Zeiss. Fluoresce in-marked mate- 
rials on the substrate will fluoresce >488 nm light, and the 
fluoresced light will be collected by the microscope and 
passed through the mirror. The fluorescent light from the 45 
substrate is then directed through a wavelength filter 209 
and, thereafter through an aperture plate 211. Wavelength 
filter 209 may be, for example, a model no. OG530 manu- 
factured by Melles Griot and aperture plate 211 may be, for 
example, a model no. 477352/477380 manufactured by Carl 
Zeiss. 

The fluoresced light then enters a photoraultiplier tube 
212 which in some embodiments is a model no. R943-02 
manufactured by Hamamatsu, the signal is amplified in 
preamplifier 214 and photons are counted by photon counter 
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fluorescent marked molecules are located on the substrate. 
Consequently, for a slide which has a matrix of polypeptides, 
for example, synthesized on the surface thereof, it is possible 
to determine which of the polypeptides is complementary to 
a fluorescently marked receptor. 

According to preferred embodiments, the intensity and 
duration of the light applied to the substrate is controlled by 
varying the laser power and scan stage rate for improved 
signal-to-noise ratio by maximizing fluorescence emission 
and minimizing background noise. 

While the detection apparatus has been illustrated prima- 
rily herein with regard to the detection of marked receptors, 
the invention will find application in other areas. For 
example, the detection apparatus disclosed herein could be 
used in the fields of catalysis, DNA or protein gel scanning, 
and the like. 

VI. Determination of Relative Binding Strength of 
Receiptors 

The signal-to-noise ratio of the present invention is suf- 
ficiently high that not only can the presence or absence of a 
receptor on a ligand be detected, but also the relative binding 
affinity of receptors to a variety of sequences can be deter- 
mined. 

In practice it is found that a receptor will bind to several 
peptide sequences in an array, but will bind much more 
strongly to some sequences than others. Strong binding 
affinity will be evidenced herein by a strong fluorescent or 
radiographic signal since many receptor molecules will bind 
in a region of a strongly bound ligand. Conversely, a weak 
binding affinity will be evidenced by a weak fluorescent or 
radiographic signal due to the relatively small number of 
receptor molecules which bind in a particular region of a 
substrate having a ligand with a weak binding affinity for the 
receptor, consequently, it becomes possible to determine . 
50 relative binding avidity (or affinity in the case of univalent 
interactions) of a ligand herein by way of the intensity of a 
fluorescent or radiographic signal in a region containing that 
ligand. 

Semiquantitative data on affinities might also be obtained 
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216. Ttit number of photons is recorded as a function of the 55 by varymg washing cond.Uons and concentrations of the 



location in the computer 204. Pre- Amp 214 may be, for 
example, a model no. SR440 manufactured by Stanford 
Research Systems and photon counter 216 may be a model 
no. SR400 manufactured by Stanford Research Systems. 
The substrate is then moved to a subsequent location and the 
process is repeated. In preferred embodiments the data are 
acquired every 1 to 100 /«ri with a data collection diameter 
of about 0.8 to 10 /*m preferred. In embodiments with 
sufficiently high fluorescence, a CCD detector with broad- 
field illumination is utilized. 

By counting the number of photons generated in a given 
area in response to the laser, it is possible to determine where 
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receptor. This would be done by comparison to known 
ligand receptor pairs, for example. 

VII. Examples 

The following examples are provided to illustrate the 
efficacy of the inventions herein. All operations were con- 
ducted at about ambient temperatures and pressures unless 
indicated to the contrary. 
A. Slide Preparation 

Before attachment of reactive groups it is preferred to 
clean the substrate which is, in a preferred embodiment a 
glass substrate such as a microscope slide or cover slip. 
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According to one embodiment the slide is soaked in an embodiments, several residues are sequentially added at one 

alkaline bath consisting of, for example, 1 liter of 95% location before moving on to the next location. Cycle times 

ethanol with 120 ml of water and 120 grams of sodium will generally be limited by the coupling reaction rate, now 

hydroxide for 12 hours. The slides are then washed under as short as 20 min in automated peptide synthesizers. This 

running water and allowed to air dry, and rinsed once with 5 s t e p is optionally followed by addition of a protecting group 

a solution of 95% ethanol.. | to stabilize the array for later testing. For some types of 

The slides are then aminated with, for example, amino- polymers (e.g., peptides), a final deprotection of the entire 

propyltriethoxysilane for the purpose of attaching amino surface (removal of photoprotective side chain groups) may 

groups to the glass surface on linker molecules, although any be required. 

omega fiinctionalized silane could also be used for this More particularly, as shown in FIG. 10A, the glass 20 is 

purpose. In one embodiment 0.1% aminopropyltriethoxysi- 10 provided with regions 22, 24, 26, 28, 30, 32, 34, and 36. 

lane is utilized, although solutions with concentrations from Regions 30, 32, 34, and 36 are masked, as shown in FIG. 

10~ 7 % to 10% may be used, with about 10 r3 % to 2% JOB and the glass is irradiated and exposed to a reagent 

preferred. A 0.1% mixture is prepared by adding to 100 ml containing "A" (e.g., gly), with the resulting structure shown 

of a 95% ethanol/5% water mixture, 100 microliters (jd) of ^ piG. IOC. Thereafter, regions 22, 24, 26, and 28 are 

aminopropyltriethoxysilane. The mixture is agitated at about 15 masked, the glass is irradiated (as shown in FIG. 10D) and 

ambient temperature on a rotary shaker for about 5 minutes. exposed to a reagent containing "B" (e.g., phe), with the 

500 jul of this mixture is then applied to the surface of one resulting structure shown in FIG. 10E. The process 

side of each cleaned slide. After 4 minutes, the slides are proceeds, consecutively masking and exposing the sections 

decanted of this solution and rinsed three times by dipping as shown until the structure shown in FIG. 10M is obtained, 

in, for example, 100% ethanol. 2 o The glass is irradiated and the terminal groups are, 

After the plates dry, they are placed in a 110-120° C. optionally, capped by acetylation. As shown, all possible 

vacuum oven for about 20 minutes, and then allowed to cure trimers of gly/phe are obtained. 

at room temperature for about 12 hours in an argon envi- i n m is example, no side chain protective group removal is 

ronment. The slides are then dipped into DMF necessary. If it is desired, side chain deprotection may be 

(dimethylformamide) solution, followed by a thorough ^ accomplished by treatment with ethanedithiol and trifluoro- 

washing with methylene chloride. acetic acid. 

The aminated surface of the slide is then exposed to about i n general, the number of steps needed to obtain a 

500 /d of, for example, a 30 millimolar (mM) solution of particular polymer chain is defined by: 
NVOC-GABA (gamma amino butyric acid) NHS 
(N-hydroxysuccinimide) in DMF for attachment of a 

NVOC-GABA to each of the amino groups. 30 «x/ (l) 

The surface is washed with, for example, DMF, methyl- 
ene chloride, and ethanol. where. 

Any unreacted aminopropyl silane on the surface — that is, n»the number of monomers in the basis set of monomers, 

those amino groups which have not had the NVOC-GABA and 

attached— are now capped with acetyl groups (to prevent 35 i otne number of monomer units in a polymer chain, 

further reaction) by exposure to a 1:3 mixture of acetic Conversely, the synthesized number of sequences of 

anhydride in pyridine for 1 hour. Other materials which may length 1 will be: 
perform this residual capping function include trifluoroace- 
tic anhydride, formicacetic anhydride, or other reactive 

acylating agents. Finally, the slides are washed again with ^ //. (2) 

DMF, methylene chloride, and ethanol. * . . . , . , , . , . 

B. Synthesis of Eight Trimers of "A" and "B" 0f ™™>&™ x % d ! ver ? lt y * ob k tamed ^ ™f ****** 

FIG. 10 illustrated possible synthesis of the eight trimers strategies which wiU also mclude the synthesis of polymers 

of the two-monomer set: gly, phe (represented by "A" and a Ie "g th of I( f *?» _ L ,f - ™ the , cas f > a11 

"B respectively). A glass slide bearing silane groups ter- M polymep having a length less than or equal to 1 are 

minating in 6-nLveratryloxycarboxamide (NVOONH) 45 synthesized, the number of polymers synthesized will be: 

residues is prepared as a substrate. Active esters d'+o'- 1 * . . . +n\ (3) 
(pentafluorophenyl, OBt, etc.) of gly and phe protected at the 

amino group with NVOC are prepared as reagents. While The maximum number of lithographic steps needed will 

not pertinent to this example, if side chain protecting groups. generally be n for each "layer" of monomers, i.e., the total 

are required for the monomer set, these must not be photo- 50 . number of masks (and, therefore, the number of lithographic 

reactive at the wavelength of light used to protect the steps) needed will be nxl. The size of the transparent mask 

primary chain. regions will.vary in accordance with the area of the substrate 

For a monomer set of size n, nxl cycles are required to available for synthesis and the number of sequences to be 

synthesize all possible sequences of length 1. A cycle con- formed. In general, the size of the synthesis areas will be: 

sists of: 55 * ■ /awc\ 

„ . . , . , , size of synthesis arcas-(A)/(S) 

1. Irradiation through an appropriate mask to expose the 

amino groups at the sites where the next residue is to be where: 

added, with appropriate washes to remove the ^ ^ t ne total area available for synthesis; and 

by-products of the deprotection. S is the number of sequences desired in the area. 

2. Addition of a single activated and protected (with the 60 j t ^j] be appreciated by those of skill in the art that the 
same photochemically-removable group) monomer, above method could readily be used to simultaneously 
which will react only at the sites addressed in step 1, produce thousands or millions of oligomers on a substrate 
with appropriate washes to remove the excess reagent using the photolithographic techniques disclosed herein, 
from the surface. Consequently, the method results in the ability to practically 

The above cycle is repeated for each member of the 65 test large numbers of, for example, di, tri, tetra, penta, hexa, 

monomer set until each location on the surface has been hepta, octapeptides, dodecapeptides, or larger polypeptides 

extended by one residue in one embodiment. In other (or correspondingly, polynucleotides). 
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The above example has illustrated the method by way of 
a manual example. It will of course be appreciated that 
automated or semi-automated methods could be used. The 
substrate would be mounted in a flow cell for automated 
addition and removal of reagents, to minimize the volume of 
reagents needed, and to more carefully control reaction 
conditions. Successive masks could be applied manually or 
automatically. : . : . 
C Synthesis of a Dimer of an Aminopropyl Group and a 
Fluorescent Group 



photons per fluorescein molecule that can be detected. For 
the curves illustrated in FIG. 11 , this calculation indicates 
the radiation of about 40 to 50 photons per fluorescein 
molecule are detected. 

E. Determination of the Number of Molecules Per Unit Area 
Aminopropylated glass microscope slides prepared 
according to the methods discussed above were utilized in 
order to establish the density of labeling of the slides. The 
free amino termini of the slides were reacted with FITC 
(fluorescein isothiocyanate) which forms a covalent linkage 



In synthesizing the dimer of an aminopropyl group and a 10 with the amino group. The slide is then scanned to count the 



fluorescent group, a functionalized durapore membrane was 
used as a substrate. The durapore membrane was a polvvi- 
nylidine difluoride with aminopropyl groups. The amino- 
propyl groups were protected with the DDZ group by 
reaction of the carbonyl chloride with the amino groups, a 
reaction readily known to those of skill in the art. The 
surface bearing these groups was placed in a solution of THF 
and contacted with a mask bearing a checkerboard pattern of 
1 mm opaque and transparent regions. The mask was 
exposed to ultraviolet light having a wavelength down to at 
least about 280 nm for about 5 minutes at ambient 
temperature, although a wide range of exposure rimes and 
temperatures may be appropriate in various embodiments of 
the invention. For example, in one embodiment, an exposure 
time of between about 1 and 5000 seconds may be used at 
process temperatures of between -70 and +50° C. 

In one preferred embodiment, exposure times of between 
about 1 and 500 seconds at about ambient pressure are used. 
In some preferred embodiments, pressure above ambient is 
used to prevent evaporation. 
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number of fluorescent photons generated in a region which, 
using the estimated 40-50 photons per fluorescent molecule, 
enables the calculation of the number of molecules which 
are on the surface per unit area. 

A slide with aminopropyl silane on its surface was 
immersed in a 1 mM solution of FITC in DMF for 1 hour at 
about ambient temperature. After reaction, the slide was 
washed twice with DMF and then washed with ethanol, 
water, and then ethanol again. It was then dried and stored 
in the dark until it was ready to be examined. 

Through the use of curves similar to those shown in FIG. 
11, and by integrating the fluorescent counts under the 
exponentially decaying signal, the number of free amino 
groups on the surface after derivitization was determined. It 
was determined that slides with labeling densities of 1 
fluorescein per KPxlO 3 to -2x2 nm could be reproducibly 
made as the concentration of aminopropyltriethoxysilane 
varied from 10~ $ % to 10 _1 %. 

F. Removal of NVOC and Attachment of a Fluorescent 
Marker 

NVOC-GABA groups were attached as described above. 



The surface of the membrane was then washed for about 30 The entire surface of one slide was exposed to light so as to 



1 hour with a fluorescent label which included an active ester 
bound to a chelate of a lanthanide. Wash times will vary over 
a wide range of values from about a few minutes to a few 
hours. These materials fluoresce in the red and the green 
visible region. After the reaction with the active ester in the 35 
fluorophore was complete, the locations in which the fluo- 
rophore was bound could be visualized by exposing them to 
ultraviolet light and observing the red and the green fluo- 
rescence. It was observed that the derivatized regions of the 
substrate closely corresponded to the original pattern of the 
mask. 

D. Demonstration of Signal Capability 

Signal detection capability was demonstrated using a 
low-level standard fluorescent bead kit manufactured by 
Row Cytometry Standarda and having model no. 824. This 
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expose a free amino group at the end of the gamma amino 
butyric acid. This slide, and a duplicate which was not 
exposed, were then exposed to fluorescein isothiocyanate 
(FITC). 

FIG. 12A illustrates the slide which was not exposed to 
light, but which was exposed to FITC. The units of the x axis 
are time and the units of the y axis are counts. The trace 
contains a certain amount of background fluorescence. The 
duplicate slide was exposed to 350 nm broadband illumi- 
nation for about 1 minute (12 mW/cm 2 , -350 nm 
illumination), washed and reacted with FITC. The fluores- 
cence curves for this slide are shown in FIG. 12B. A large 
increase in the level of fluorescence is observed, which 
indicates photolysis has exposed a number of amino groups 
on the surface of the slides for attachment of a fluorescent 



kit includes 5.8 /ma diameter beads, each impregnated with marker. 



a known number of fluorescein molecules. 

One of the beads was placed in the illumination field on 
the scan stage as shown in FIG. 9 in a field of a laser spot 
which was initially shuttered. After being positioned in the 
illumination field, the photon detection equipment was 
turned on. The laser beam was unblocked and it interacted 
with the particle bead, which then fluoresced. Fluorescence 
curves of beads impregnated with 7,000; 13,000; and 29,000 
fluorescein molecules, are shown in FIGS. 11A, 11B, and 
11C respectively. On each curve, traces for beads without 
fluorescein molecules are also shown. These experiments 
were performed with 488 nm excitation, with 100 //W of 
laser power. The light was focused through a 40 power 0.75 
NA objective. 

The fluorescence intensity in all cases started off at a high 
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G. Use of a Mask in Removal of NVOC 

The next experiment was performed with a 0.1% amino- 
propylated slide. Light from a Hg — Xe arc lamp was imaged 
onto the substrate through a laser-ablated chrome-on-glass 
mask in direct contact with the substrate. 
. This slide was illuminated for approximately 5 minutes, 
with 12 mW of 350 nm broadband light and then reacted 
with the 1 mM FITC solution. It was put on the laser 
detection scanning stage and a graph was plotted as a 
two-dimensional representation of position versus fluores- 
cence intensity. The fluorescence intensity (in counts) as a 
function of location is given on the scale to the right of FIG. 
13A for a mask having 100x100 ^m squares. 

The experiment was repeated a number of times through 
various masks. The fluorescence pattern for a 50 pm mask is 



value and then decreased exponentially. The fall-off in 60 illustrated in FIG. 13B, for a 20 ^m mask in FIG. 13C, and 

intensity is due to photobleaching of the fluorescein mol- for a 10 pm mask in FIG. 13D. The mask pattern is distinct 

ecules. The traces of beads without fluorescein molecules down to at least about 10 /on squares using this lithographic 

are used for background subtraction. The difference in the technique. 

initial exponential decay between labeled and nonlabeled H. Attachment of YGGFL and Subsequent Exposure to Herz 

beads is integrated to give the total number of photon counts, 65 Antibody and Goat Anlimouse 

and this number is related to the number of molecules per In order to establish that receptors to a particular polypep- 

bead. Therefore, it is possible to deduce the number of tide sequence would bind to a surface-bound peptide and be 
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detected, Leu enkephalin was coupled to the surface and 
recognized by an antibody. A slide was derivatized with 

0. 1% amino propyl-triethoxysilane and protected with 
NVOC. A 500 jum checkerboard mask was used to expose 
the slide in a flow cell using backside contact printing. The 5 
Leu enkephalin sequence (H 2 N-tyrosine,glycine,glycine, 
phenylalanine ,leucine-C0 2 H, otherwise referred to herein as 
YGGFL) was attached via its carboxy end to the exposed 
amino groups on the surface of the, slide. The peptide was 
added in DMF solution with the BOP/HOBT/DIEA cou- 
pling reagents and recirculated through the flow cell for 2 10 
hours at room temperature. 

A first antibody, known as the Herz antibody, was applied 
to the surface of the slide for 45 minutes at 2 ^g/ml in a 
supercocktail (containing 1% BSA and 1% ovalbumin also 
in this case). A second antibody, goat anti-mouse fluorescein 15 
conjugate, was then added at 2 //g/ml in the supercocktail 
buffer, and allowed to incubate for 2 hours. 

The results of this experiment are provided in FIG. 14. 
Again, this figure illustrates fluorescence intensity as a 
function of position. The fluorescence scale is shown on the 
right. This image was taken at 10 ftm steps. This figure 20 
indicates that not only can deprotection be carried out in a 
well defined pattern, but also that (1) the method provides 
for successful coupling of peptides to the surface of the 
substrate, (2) the surface of a bound peptide is available for 
binding with an antibody, and (3) that the detection appa- 25 
ratus capabilities are sufficient to detect binding of a recep- 
tor. 

1. Monomer-by-Monomer Formation of YGGFL and Sub- 
sequent Exposure to Labeled Antibody 

Monomer-by-monomer synthesis of YGGFL and GGFL 30 
in alternate squares was performed on a slide in a checker- 
board pattern and the resulting slide was exposed to the Herz 
antibody. This experiment and the results thereof are illus- 
trated in FIGS. 15A, 15B, 15C, and 15D. 

In FIG. 15A, a slide is shown which is derivatized with 35 
the aminopropyl group, protected in this case with t-BOC 
(t-butoxycarbonyl). The slide was treated with TFA to 
remove the t-BOC protecting group. E-aminocaproic acid, 
which was t-BOC protected at its amino group, was then 
coupled onto the aminopropyl groups. The aminocaproic ^ 
acid serves as a spacer between the aminopropyl group and 
the peptide to be synthesized. The amino end of the spacer 
was deprotected and coupled to NVOC-leucine. The entire 
slide was then illuminated with 12 mW of 325 nm broad- 
band illumination. The slide was then coupled with NVOC- 
phenylalanine and washed. The entire slide was again 
illuminated, then coupled to NVOC-glycine and washed. 
The slide was again illuminated and coupled to NVOC- 
glycine to form the sequence shown in the last portion of 
FIG. 15A 

As shown in FIG. 15B, alternating regions of the slide so 
were then illuminated using a projection print using a 
500x500 fim checkerboard mask; thus, the amino group of 
glycine was exposed only in the lighted areas. When the next 
coupling chemistry step was carried out, NVOC-tyrosine 
was added, and it coupled only at those is spots which had 55 
received illumination. The entire slide was then illuminated 
to remove all the NVOC groups, leaving a checkerboard of 
YGGFL in the lighted areas and in the other areas, GGFL. 
The Herz antibody (which recognizes the YGGFL, but not 
GGFL) was then added, followed by goat anti-mouse fluo- ^ 
rescein conjugate. 

The resulting fluorescence scan is shown in FIG. 15C, and 
the scale for the fluorescence intensity is again given on the 
right. Dark areas contain the tetrapeptide GGFL, which is 
not recognized by the Herz antibody (and thus there is no 
binding of the goat anti-mouse antibody with fluorescein 65 
conjugate), and in the red areas YGGFL is present. The 
YGGFL pentapeptide is recognized by the Herz antibody 



and, therefore, there is antibody in the lighted regions for the 
fluorescein -conjugated goat anti-mouse to recognize. 

Similar patterns are shown for a 50 /an mask used in 
direct contact ("proximity print") with the substrate in FIG. 
15D. Note that the pattern is more distinct and the corners 
of the checkerboard pattern are touching when the mask is 
placed in direct contact with the substrate (which reflects the 
increase in resolution using this technique). 
J. Monomer-by-Monomer Synthesis of YGGFL and PGGFL 

A synthesis using a 50 fim checkerboard mask similar to 
that shown in FIG. 15 was conducted. However, P was added 
to the GGFL sites on the substrate through an additional 
coupling step. P was added by exposing protected GGFL to 
light through a mask, and subsequence exposure to P in the 
manner set forth above. Therefore, half of the regions on the 
substrate contained YGGFL and the remaining half con- 
tained PGGFL. 

The fluorescence plot for this experiment is provided in 
FIG. 16. As shown, the regions are again readily discernable. 
This experiment demonstrates that antibodies are able to 
recognize a specific sequence and that the recognition is not 
length-dependent. 

K. Monomer-by-Monomer Synthesis of YGGFL and YP.G- 
GFL 

In order to further demonstrate the operability of the 
invention, a 50 /on checkerboard pattern of alternating 
YGGFL and YPGGFL was synthesized on a substrate using 
techniques like those set forth above. The resulting fluores- 
cence plot is provided in FIG. 17. Again, it is seen that the 
antibody is clearly able to recognize the YGGFL sequence 
and does not bind significantly at the YPGGFL regions. 
L. Synthesis of an Array of Sixteen Different Amino Acid 
Sequences and Estimation of Relative Binding Affinity to 
Herz Antibody 

Using techniques similar to those set forth above, an array 
of 16 different amino acid sequences (replicated four times) 
was synthesized on each of two glass substrates. The 
sequences were synthesized by attaching the sequence 
NVOC-GFL across the entire surface of the slides. Using a 
series of masks, two layers of amino acids were then 
selectively applied to the substrate. Each region had dimen- 
sions of 0.25 cmxO.0625 cm. The first slide contained amino 
acid sequences containing only L amino acids while the 
second slide contained selected D amino acids. FIGS. 18 A 
and 18B illustrate a map of the various regions on the first 
and second slides, respectively. The patterns shown in FIGS. 
18A and 18B were duplicated four times on each slide. The 
slides were then exposed to the Herz antibody and 
fluorescein-labeled goat anti-mouse. 

FIG. 19 is a fluorescence plot of the first slide, which 
contained only L amino acids. Red indicates strong binding 
(149,000 counts or more) while black indicates little or no 
binding of the Herz antibody (20,000 counts or less). The 
bottom right-hand portion of the slide appears "cut off* 
because the slide was broken during processing. The 
sequence YGGFL is clearly most strongly recognized. The 
sequences YAGFL and YSGFL also exhibit strong recogni- 
tion of the antibody. By contrast, most of the remaining 
sequences show little or no binding. The four duplicate 
portions of the slide are extremely consistent in the amount 
of binding shown therein. 

FIG. 20 is a fluorescence plot of the second slide. Again, 
strongest binding is exhibited by the YGGFL sequence. 
Significant binding is also detected to YaGFL, YsGFL, and 
YpGFL. The remaining sequences show less binding with 
the antibody. Note the low binding efficiency of the 
sequence yGGFL. 

Table 6 lists the various sequences tested in order of 
relative fluorescence, which provides information regarding 
relative binding affinity. 
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THE HUMAN 
GENOME 



A historic 
moment for 
the scientific 
endeavor. 



umanity has been given a great gift. With the completion of the human 
genome sequence, we have received a powerful tool for unlocking the 
secrets of our genetic heritage and for finding our place among the other 
participants in the adventure of life. 

This week's issue of Science contains the report of the sequencing of 
the human genome from a group of authors led by Craig Venterof Celera 
Genomics. The report of the sequencing of the human genome from the 
publicly funded consortium of laboratories led by Francis Collms appears 
in this week's Nature. This stunning achievement has been portrayed — 
often unfairly— as a competition between two 
ventures, one public and one private. That characterization detracts from 
the awesome accomplishment jointly unveiled this week. In truth, each 
project contributed to the other The inspired vision that launched the 
publicly funded project roughly 10 years ago reflected, and now rewards, 
the confidence of those who believe that the pursuit of large-scale funda- 
mental problems in the life sciences is in the national interest The technical 
innovation and drive of Craig Venter and his colleagues made it possible 
to celebrate this accomplishment far sooner than was believed possible. 
Thus, we can salute what has become, in the end, not a contest but a 
marriage (perhaps encouraged by shotgun) between public funding and 
private entrepreneurship. 

There are excellent scientific reasons for applauding an outcome that 
has given us two winners. Two sequences are better than one; the opportunity for comparison and con- 
vergence is invaluable. Indeed, a real-world proof of the importance of access to both sets of data can 
be found in the pages of this issue of Science, in the comparative analysis by Olivier et al (p. 1298). 

Although we have made the point before, it is worth repeating that the sequencing of the human 
genome represents, not an ending, but the beginning of a new approach to biology. As Galas say? in 
his Viewpoint (p. 1257), the knowledge that all of the genetic components of any process can .be 
identified will give extraordinary new power to scientists. Because of this breakthrough, research 
can evolve from analyzinglhe effects of individual genes to a more integrated view that examines 
whole ensembles of genes as they interact to form a living human being. Several articles in this issue 
' highlight how this approach is already beginning to revolutionize the way we look at human disease. 
This has been a massive project, on a scale unparalleled in the history of biology, but of course 
it has built on the scientific insights of centuries of investigators. By coincidence, this landmark 
announcement falls during the week of the anniversary of the birth of Charles Darwin. Darwin's 
message that the survival of a species can depend on its ability to evolve in the face of change is 
peculiarly pertinent to discussions that have gone on in the past year over access to the Celera data. 
(Full information regarding the agreements that were reached to make the data available can l?e 
found at ww.sciencemag.org/feature/data/announcement/gsp.shl.) We are willing to be flexible in 
allowing data repositories other than the traditional GenBank, while insisting on access to all the . 
data needed to verify conclusions. In this domain, change is everywhere: Commercial researchers 
are producing more and more potentially valuable sequences, yet (at least in the United States) 
laws governing databases provide scant protection against piracy. Had the Celera data been kept se- 
cret, it would have been a serious loss to the scientific community. We hope that our adaptability in 
the face of change will enable other proprietary data to be published after peer review, in a way tjiat 
satisfies our continuing commitment to full access. 

It should be no surprise that an achievement so stunning, and so carefully watched, has created ; 
new challenges for the scientific venture. Science is proud to have played a role in bringing this 
discovery onto the public stage. It is literally true that this is a historic moment for the scientific en- 
deavor. The human genome has been called the Book of Life. Rather, it is a library, in which, with 
rules that encourage exploration and reward creativity, we can find many of the books that will 
help define us and our place in the great tapestry of life. 

Barbara R. Jasny and Donald Kennedy 
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ABSTRACT 



The present invention provides polynucleotides (kin) which 
identify and encode novel protein kinases (KIN) expressed 
in various human cells and tissues. The present invention 
also provides for antisense sequences and oligonucleotides 
designed from the nucleotide sequences or their comple- 
ments. The invention further provides genetically engi- 
neered expression vectors and host cells for the production 
of purified KIN peptides, antibodies capable of binding KIN, 
and inhibitors specifically bind KIN. The invention specifi- 
cally provides for diagnostic kits and assays which identify 
a disorder or disease with altered kinase expression and 
allow monitoring of patients during drug therapy. These 
assays utilize oligonucleotides or antibodies produced using 
the kin polynucleotides. 

4 Claims, No Drawings 
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HUMAN KINASE HOMOLOGS 
FIELD OF THE INVENTION 

The present invention is in the field of molecular biology; 
more particularly, the present invention describes nucleic 
acid sequences for novel human kinase homologs. 

BACKGROUND OF THE INVENTION 

Kinases regulate many different cell proliferation, 
differentiation, and signalling processes by adding phos- 
phate groups to proteins. Uncontrolled signalling has been 
implicated in inflammation, oncogenesis, arteriosclerosis, 
and psoriasis. Reversible protein phosphorylation is the 
main strategy for controlling activities of eukaryotic cells. It 
is estimated that more than 1000 of the 10,000 proteins 
active in a typical mammalian cell are phosphorylated. The 
high energy phosphate which drives activation is generally 
transferred from adenosine triphosphate molecules (ATP) to 
a particular protein by protein kinases and removed from 
that protein by protein phosphatases. 

Phosphorylation occurs in response to extracellular sig- 
nals (hormones, neurotransmitters, growth and differentia- 
tion factors, etc), cell cycle checkpoints, and environmental 
or nutritional stresses and is roughly analogous to the 
turning on a molecular switch. When the switch goes on, the 
appropriate protein kinase activates a metabolic enzyme, 
regulatory protein, receptor, cytoskeletal protein, ion chan- 
nel or pump, or transcription factor. 

The kinases comprise the largest known protein family, a 
superfamily of enzymes with widely varied functions and 
specificities. They are usually named after their substrate, 
their regulatory molecules, after some, aspect of a mutant 
phenotype or arbitrarily. Almost all kinases contain a similar 
250-300 amino acid catalytic domain. The N-terminal 
domain, which contains subdomains I-IV, generally folds 
into a two-lobed structure and binds and orients the ATP (or 
GTP) donor molecule. The larger C terminal lobe, which 
contains subdomains VIA-XI, binds the protein substrate 
and carries out the transfer of the gamma phosphate from 
ATP to the hydroxyl group of a serine, threonine, or tyrosine 
residue. Subdomain V spans the two lobes. 

The kinases may be categorized into families by the 
different amino acid sequences (generally between 5 and 
100 residues) located on either side of, or inserted into loops 
of, the kinase domain. These added amino acid sequences 
allow the regulation of each kinase as it recognizes and 
interacts with its target protein. The primary structure of the 
kinase domains is conserved and can be further subdivided 
into 12 subdomains. The following residues are relatively 
(-95%) invariant: G 50 and G 52 in subdomain I, K^ in 
subdomain II, G 91 in subdomain III, E^a in subdomain VIII, 
D 220 and G 225 in subdomain IX, and the motifs or patterns 
of amino acids in subdomains VIB, VIII and IX (Hardie G. 
and Hanks S. (1995) The Protein Kinase Facts Books, I and 
II, Academic Press, San Diego, Calif.). 

The cyclin dependent protein kinase (cdk) family includes 
proteins which are turned on and off as the cell proceeds 
through the cell cycle. A cdk is active as a kinase only when 
it is bound to a cyclin. Cdk activation simultaneously 
requires both the addition of a high energy phosphate to a 
threonine residue by a kinase and the removal of a 
covalently-bound phosphate from a specific tyrosine residue 
by a phosphatase. The concentration of some cyclins rises 
gradually through a particular part of the cell cycle until their 
targeted proteolysis ends the coordinated interaction among 
the cyclin, kinase, and phosphatase molecules. 
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The second-messenger dependent protein kinases prima- 
rily mediate the effects of second messengers such as cyclic 
AMP (cAMP) cyclic GMP, inositol triphosphate, 
phosphatidylinositol, 3, 4 ,5 -triphosphate, cyclic ADPribose, 

5 arachidonic acid and diacylglycerol. For purposes of 
example, the structure and function of cyclic AMP- 
dependent protein kinase (A-kinase) will be described. 
Mammalian cells generally contain at least two forms of 
A-kinase; type 1 which is cytosolic, and type 2 which is 
0 bound to plasma membrane, nuclear membrane or microtu- 
bules. In its inactive state, A-kinase consists of a complex of 
two catalytic subunits and two regulatory subunits. When 
each regulatory subunit has bound two molecules of cAMP, 
the catalytic subunit is activated and can transfer a high 

15 energy phosphate from ATP to the serine or threonine of a 
substrate protein. Substrate proteins are usually marked by 
the presence of two or more basic amino acids on their 
amino terminal sides. A-kinase is important in metabolism 
of glycogen, for inactivation of phosphatase inhibitor 

20 protein, in transcription of genes which contain a regulatory 
region called the cAMP response element (CRE), and in 
regulation of the ion channels of olfactory neurons. 

Protein kinase C (PKC) is a water-soluble, Ca ++ - 
dependent kinase, commonly found in brain tissue, which 

25 moves to the plasma membrane in the presence of Ca*"*' ions. 
Approximately half of the known isoforms of PKC are 
activated initially by diacylglycerol and phosphatidylserine. 
Prolonged activation of PKC depends on continued produc- 
tion of diacyglycerol molecules which are formed when 

30 phospholipases cleave phosphatidylcholine. In nerve cells, 
PKC phosphorylates ion channels and alters the excitability 
of the cell membrane. In other cells, activation of PKC 
increases gene transcription either by triggering a protein 
kinase cascade which activates a regulatory element, (much 

35 like CRE above) or by phosphorylating and deactivating an 
. inhibitor of the regulatory protein. 

Ca ++ /calmodulin-dependent protein kinases (CaM- 
kinases) mediate most of the actions of Ca ++ in human cells. 
The CaM-kinases include enzymes with narrow substrate 

40 specificity such as myosin light chain kinase which activates 
smooth muscle contraction and phosphorylase kinase which 
activates glycogen breakdown and the multifunctional 
enzyme, CaM -kinase II which is found in all cells. Phos- 
phorylase kinase has four subunits: y is the catalytic moiety 

45 and a, fi and d5 are regulatory. Since subunits a and p are 
phosphorylated by A-kinase and subunit dS is Ca"*~7 
calmodulin, glycogen breakdown can be activated by either 
cAMP or Ca~ 

CaM-kinase II is particularly enriched in catecholamine 

50 synapses. In those neurons, Ca** influx stimulates both the 
release of dopamine, noradrenaline or adrenaline and also 
their resynthesis through the activation of CaM-kinase II. 
Although the main role of CaM-kinase II is phosphorylation 
of tyrosine hydroxylase, the rate-limiting enzyme of cat- 

55 echolamine synthesis, CaM-kinase II also autophosphory- 
lates and remains active until phosphotases overwhelm it. 

Transmembrane protein-tyrosine kinases are receptors for 
most growth factors. The first characterized receptor for 
epidermal growth factor (EGF) is a single pass transmem- 

60 brane protein of about 1200 amino acids with an extracel- 
lular glycosylated portion that interacts with the 53 amino 
acid EGF molecule. Binding activates the transfer of a 
phosphate group from ATP to selected tyrosine side chains 
of the receptor and other specific proteins. Other protein 

65 receptors with similar structure include the following growth 
and differentiation factors (GF)— -platelet derived GF, fibro- 
blast GF, hepatocyte GF, insulin and insulin-like GFs, nerve 
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GF, vascular endothelial GF, macrophage colony stimulating 
factor, etc. Each protein phosphorylates itself by receptor 
dimerization to initiate the intracellular signalling cascade. 

Many protein-tyrosine kinases lack transmembrane 
regions and form a complex with the intercellular regions of 5 
other cell surface receptors. The best known NR-PTKs are 
the Src kinase family (Src, Yes, Fgr, Fyn, Lck, Lyn, Hck, 
B lk, etc) and the Janus kinase family (Jakl, Jak2, Jak3, 
Tyk2, etc). The Src PTKs are located on the cytoplasmic side 
of the plasma membrane and are characterized by Src 10 
homology regions 2 and 3 (SH2 and SH3). Src PTKs 
recognize short peptide motifs bearing phosphotyrosine or 
proline residues, respectively, and mediate protein-protein 
interactions that regulate a whole range of intracellular 
signalling molecules. Janus PTKs contain PTK or PTK-like 15 
domains and interact with growth hormone, prolactin, and 
some of the same cytokine receptors as Src PTKs. The 
cytokine receptors are unique both in their ability to recruit 
multiple PTKs and in the diversity of their intracellular 
domains which allow flexibility in their responses within 20 
different cell types (Taniguchi T. (1995) Science 
268:251-55). Src and Jak kinases were first identified as the 
products of mutant oncogenes in cancer cells where their 
activation was no longer subject to normal cellular controls. 

Extracellular signalling proteins such as transforming 25 
growth factor-p (TGF-0), activins, bone morphogenetic 
protein, and related members of the TGF-p superfamily 
interact with receptor serine/threonine kinases. Like EGF 
above, these receptor kinases have a single pass transmem- ^ 
brane domain with a serine/threonine kinase residue on the 
cytosolic side of the plasma membrane. The signalling 
pathways which are activated by binding the extracellular 
signalling molecules are presently under investigation. 

Mitogen-activated protein (MAP) kinases also regulate 35 
intracellular signalling pathways. They mediate signal trans- 
duction from cell surface to nuclei via phosphorylation 
cascades. Several subgroups have been identified, and each 
manifests different substrate specificities and responds to 
distinct extracellular stimuli (Egan S. E. and Weinberg R. A. 4Q 
(1993) Nature 365:781-783). 

MAP kinase signalling pathways are present in mamma- 
lian cells as well as in yeast. The extracellular stimuli which 
activate mammalian pathways include epidermal growth 
factor (EGF), ultraviolet light, hyperosmolar medium, heat 45 
shock, endotoxic lipopolysaccharide (LPS), and pro- 
inflammatory cytokines such as tumor necrosis factor (TNF) 
and interleukin-1 (IL" 1 )- In Saccharomyces cerevisiae, 
exposure to mating pheromone or hyperosmolar environ- 
ments activate the various MAP kinase, signalling pathways. 50 

Mammalian cells have at least three subgroups of MAP 
kinases (Derijard B. et al (1995) Science 267:682-5), each 
distinguished by a tripeptide motif. They are extracellular 
signal-regulated protein kinases (ERK) characterized by 
Thr-GIu-Tyr; c-Jun ammo-terminal kinases (JNK) charac- 55 
terized by Thr-Pro-iyr; and p38 kinase characterized by 
Thr-GIy-Tyr. Each subgroup is activated by dual phospho- 
rylation of threonine and tyrosine residues by MAP kinase 
kinases located upstream of the phosphorylation cascade. 
Activated MAP kinases, in turn, phosphorylate downstream go 
effectors ultimately leading to intracellular changes. 

The ERK signal transduction pathway is activated via 
tyrosine kinase receptors on the plasmalemma. When 
growth factors bind to tyrosine, they bind to noncatalytic, 
Src homology (SH) adaptor proteins (SH2-SH3-SH2) and a 65 
guanine nucleotide releasing protein (GNRP). GNRP 
reduces GTP and activates Ras proteins, members of the 
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large family of guanine nucleotide binding proteins 
(G-proteins). Activated Ras proteins bind to a protein kinase 
C-Raf-1 and activate the Raf-1 proteins. The activated Raf-1 
kinase subsequently phosphorylates MAP kinase kinase 
(MKK) which, in turn, activate ERKs. 

ERKs are proline-directed protein kinases which phos- 
phorylate Ser/Tbr-Pro motifs. In fact, cytoplasmic phospho- 
lipase A2 (cPLA2) and transcription factor Elk-1 are sub- 
strates of ERKs. The ERKs phosphorylate Ser 505 of cPLA2 
thereby increasing its enzymatic activity and resulting in 
release of arachidonic acid and the formation of lysophos- 
pholipids from membrane phospholipids. Likewise, phos- 
phorylation of the transcription factor Elk-1 by ERK ulti- 
mately increases transcriptional activity. 

JNK is distantly related to the ERK and is similarly 
activated by dual phosphorylation of Thr and Tyr and by 
MKK4 (Davis R (1994) TIBS 19:47(M73). The JNK signal 
transduction pathway is also initiated by ultraviolet light, 
osmotic stress, and the pro-inflammatory cytokines, TNF 
and IL-1. Phosphorylation of Ser 63 and Ser 73 in the NH^ 
terminal domain of the transcription factor c-Jun increases 
transcriptional activity. 

p38 is a 41 kD protein containing 360-amino acids. Its 
dual phosphorylation is activated by the MKK3 and MKK4, 
heat shock, hyperosmolar medium, IL-1 or LPS endotoxin 
(Han J. et al (1994) Science 265:808-811). Sepsis produced 
by LPS is characterized by fever, chills, tachypnea, and 
tachycardia, and severe cases may result in septic shock 
which includes hypotension and multiple organ failure. 

Cells respond to LPS as a stress signal because it alters 
normal cellular processes and induces the release of sys- 
temic mediators such as TNF. CD14 is a 
glycosylphosphatidyl-inositol-anchored membrane glyco- 
protein which serves as a LPS receptor on the plasmalemma 
of monocytic cells. The binding of LPS to CD14 causes 
rapid protein tyrosine phosphorylation of the 44- and 42-/ 
40-kD isoforms of MAP kinases. Although they bind LPS, 
these MAP kinase isoforms do not appear to belong to the 
p38 subgroup. 

An detailed understanding of kinase pathways and signal 
transduction is beginning to reveal some mechanisms for 
interceding in the progression of inflammatory illnesses and 
of uncontrolled cell proliferation. The cDNAs, 
oligonucleotides, peptides and antibodies for the human 
kinases, which are the subject of this invention and are listed 
in Table 1, provide a plurality of tools for studying signalling 
cascades in various cells and tissues and for diagnosing and 
selecting inhibitors or drugs with the potential to intervene 
in various disorders or diseases in which altered kinase 
expression is implicated. The disorders or diseases include, 
but not limited to, human X-linked agammaglobulinemia, 
nonspherocytic hemolytic anemia, atherosclerosis, carcino- 
mas (breast, ovary, renal, squamous cell and prostate), 
diabetes, gliomas, glomerular disease, hepatomegaly, Kar- 
posi's sarcoma, lymphoblastic and myelogenous leukemias, 
myoglobinuria, peptic ulcer disease, psoriasis, pulmonary 
fibrosis, restenosis, and septic shock due to cholera, 
Clostridium difficile, E. coli and Shigella (Isselbacher K. J. 
et al (1994) Harrison's Principles of Internal Medicine, 
McGraw-Hill, New York City; Levitzki A. and A. Gazit 
(1995) Science 267:1782-88). 

SUMMARY OF THE INVENTION 

The subject invention provides unique polynucleotides 
(SEQ ID NOs 1-44) which have been identified as novel 
human kinases (kin). These partial cDNAs were identified 
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among the polynucleotides which comprise various Incyte having fewer nucleotides than about 6 kb, preferably fewer 

cDNA libraries. than about 1 kb which can be used as a probe. Such probes 

The invention comprises polynucleotides which are may be labelled with reporter molecules using nick 

complementary to the kin sequences (SEQ ID Nos 1^4). translation, Klenow fill-in reaction, PCR or other methods 

The invention also comprises the use of kin sequences to 5 well known in the art. After pretesting to optimize reaction 

identify and obtain a full length human kinase cDNAs such conditions and to eliminate false positives, nucleic acid 

as SEQ ID NO 45 probes may be used in Southern, northern or in situ hybnd- 

. izations to determine whether DNA or RNA encoding the 

The invention further comprises the use of oligomers proteiQ ^ present in a biological samplef ^ii type , tissue, 

from these kin sequences in a kinases kit which can be used „ mnn rtr ^r^mcm 

. , , j . . organ ur urgdiii^iii. 

to identify a disorder or disease with altered kinase expres- "Recombinant nucleotide variants" are polynucleotides 

sion and provide a method for monitoring progress of a wh ich encode a protein. They may be synthesized by making 

patient dunng drug therapy. use of the « redunda ncy" in the genetic code. Various codon 

Aspects of the invention include use of kin sequences or substitutions, such as the silent changes which produce 

recombinant nucleic acids derived from them to produce 15 specific restriction sites or codon usage-specific mutations, 

purified peptides. Still further aspects of the invention use may be introduced to optimize cloning into a plasmid or 

these purified peptides to identify antibodies or other mol- viral vector or expression in a particular prokaryotic or 

ecules with inhibitory activity toward a particular kinase, eukaryotic host system, respectively, 

group of kinases or disease. "Linkers" are synthesized palindromic nucleotide 

In addition, the invention comprises the use of kin specific 20 sequences which create internal restriction endonuclease 

antibodies in assays to identify a disorder or disease with sites for ease of cloning the genetic material of choice into 

altered kinase expression and provides a method to monitor various vectors. "Poly linkers" are engineered to include 

the progress of a patient during drug therapy. multiple restriction enzyme sites and provide for the use of 

both those enzymes which leave 5* and 3' overhangs such as 

DESCRIPTION OF THE FIGURE 25 BamHI, EcoRI, PstI, Kpnl and Hind III or which provide a 

FIGS. 1A and IB display the full length nucleotide blunt end & EcoRV > SnaBI and StuL 

sequence for human MAP kinase from stomach tissue (SEQ "Control elements" or "regulatory sequences" are those 

ID NO 45; Incyte Clone 214915E) and its predicted amino nontranslated regions of the gene or DNA such as enhancers, 

acid sequence. promoters, introns and 3' untranslated regions which interact 

30 with cellular proteins to carry out replication, transcription, 

DETAILED DESCRIPTION OF THE and translation. They may occur as boundary sequences or 

INVENTION even split the gene. They function at the molecular level and 

Definitions along with regulatory genes are very important in 

As used herein, the abbreviation for kinase in lower case development, growth, differentiation and aging processes, 

(kin) refers to a gene, cDNA, RNA or nucleic acid sequence 35 "Chimeric" molecules are polynucleotides or polypep- 

while the upper case version (KIN) refers to a protein, tides which are created by combining one or more of 

polypeptide, peptide, oligopeptide, or amino acid sequence. nucleotide sequences of this invention (or their parts) with 

An "oligonucleotide" or "oligomer" is a stretch of nucle- additional nucleic acid sequence(s). Such combined 

otide residues which has a sufficient number of bases to be sequences may be introduced into an appropriate vector and 

used in a polymerase chain reaction (PCR). These short 40 expressed to give rise to a chimeric polypeptide which may 

sequences are based on (or designed from) genomic or be expected to be different from the native molecule in one 

cDNA sequences and are used to amplify, confirm, or reveal or more of the following kinase characteristics: cellular 

the presence of an identical, similar or complementary DNA location, distribution, ligand-binding affinities, interchain 

or RNA in a particular cell or tissue. Oligonucleotides or affinities, degradation/turnover rate, signalling, etc. 

oligomers comprise portions of a DNA sequence having at 45 "Active" is that state which is capable of being useful or 

least about 10 nucleotides and as many as about 50 of carrying out some role. It specifically refers to those 

nucleotides, preferably about 15 to 30 nucleotides. They are forms, fragments, or domains of an amino acid sequence 

chemically synthesized and may be used as probes. which display the biologic and/or immunogenic activity 

"Probes" are nucleic acid sequences of variable length, characteristic of the naturally occurring kinase, 
preferably between at least about 10 and as many as about .50 "Naturally occurring KIN" refers to a polypeptide pro- 
6,000 nucleotides, depending on use. They are used in the duced by cells which have not been genetically engineered 
detection of identical, similar, or complementary nucleic or which have been genetically engineered to produce the 
acid sequences. Longer length probes are usually obtained same sequence as that naturally produced. Specifically con- 
from a natural or recombinant source, are highly specific and templated are various polypeptides which arise from post- 
much slower to hybridize than oligomers. They may be 55 transnational modifications. Such modifications of the 
single- or double-stranded and carefully designed to have polypeptide include but are not limited to acetylation, 
specificity in PCR, hybridization membrane-based, or carboxylation, glycosylation, phosphorylation, lipidation 
ELISA-like technologies. and acylation. 

"Reporter" molecules are chemical moieties used for "Derivative" refers to those polypeptides which have been 

labelling a nucleic or amino acid sequence. They include, 60 chemically modified by such techniques as ubiquitination, 

but are not limited to, radionuclides, enzymes, fluorescent, labelling (see above), pegylation (derivatization with poly- 

chemi-luminescent, or chromogenic agents. Reporter mol- ethylene glycol), and chemical insertion or substitution of 

ecules associate with, establish the presence of, and may amino acids such as ornithine which do not normally occur 

allow quantification of a particular nucleic or amino acid in human proteins. 

sequence. 65 "Recombinant polypeptide variant" refers to any polypep- 

A "portion" or "fragment" of a polynucleotide or nucleic tide which differs from naturally occurring KIN by amino 

acid comprises all or any part of the nucleotide sequence acid insertions, deletions and/or substitutions, created using 
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recombinant DNA techniques. Guidance in determining fidelity enzyme" may include mixtures of such enzymes and 

which amino acid residues may be replaced, added or any other enzymes fitting the stated criteria, or reference to 

deleted without abolishing characteristics of interest may be the method includes reference to one or more methods for 

found by comparing the sequence of KIN with that of related obtaining cDNA sequences which will be known to those 

polypeptides and minimizing the number of amino acid 5 skilled in the art or will become known to them upon reading 

sequence changes made in highly conserved regions. this specification. 

. Amino acid "substitutions" are defined as one for one Before the present sequences, variants, formulations and 

amino acid replacements. They are conservative in nature methods for making and using the invention are described, 

when the substituted amino acid has similar structural and/or it is to be understood that the invention is not to be limited 

chemical properties. Examples of conservative replacements 10 only to the particular sequences, variants, formulations or 

are substitution of a leucine with an isoleucine or valine, an methods described. The sequences, variants, formulations 

aspartate with a glutamate, or a threonine with a serine. and methodologies may vary, and the terminology used 

Amino acid "insertions" or "deletions" are changes to or herein is for the purpose of describing particular embodi- 

within an amino acid sequence. They typically fall in the ments. The terminology and definitions are not intended to 

range of about 1 to 5 amino acids. The variation allowed in 15 be limiting since the scope of protection will ultimately 

a particular amino acid sequence may be experimentally depend upon the claims. 

determined by producing the peptide synthetically or by „ mTm ™ T 

systematically making insertions, deletions, or substitutions DESCRIPTION OF THE INVENTION 

of nucleotides in the kin sequence using recombinant DNA ^ present invention provides for purified partial protein 

techniques. 20 ^ nzsQ c DNAs which were expressed in various human 

A "signal or leader sequence" is a short amino acid and therefrom. These sequences were iden- 

sequence which or can be used, when desired, to direct the by tQeir similarily t0 published or known open reading 

polypeptide through a membrane of a cell. Such a sequence frames Qf uatrans lated control regions. Since protein kinases 

may be naturally present on the polypeptides of the present afe 33^^ ^Alb basic cellular processes such as cell 

invention or provided from heterologous sources by recom- 25 proliferat i oa| differentiation and cell signalling, these nucle- 

binant DNA techniques. otide sequences are useful in the characterization of and 

An "oligopeptide" is a short stretch of amino acid residues delineation of normal and abnormal processes. Kinase 

and may be expressed from an oligonucleotide. It may be nuc ieotide sequences are useful in diagnostic assays used to 

functionally equivalent to and either the same length as or evaluate the role of a specific kinase in normal, diseased, or 

considerably shorter than a "fragment ", "portion or 30 therape utically treated cells. 

"segment" of a polypeptide. Such ^^Z^l nucleotide have Qumerous 

stretchofaminoacidresiduesof at least about 5 ammo aads {n techni t0 mose skilled in the art 

and often about 17 or more ammo acids - H ty^y . least Q f mMai biology Vse techniques include their use as 

about 9 to 13 ammo acids, and of sufficient length to display { ^ {ioQ ^ for chromosome and gene mapping, in 

biologic anoVor immunogenic activity. 35 V { Q ^ duction of ^ or antisenS e 

An "inhibitor is a *^<^« *™ n ** nucleic acids, in Greening for new therapeutic molecules, 

chemical or physiological reaction or response. ^Common and are ^ inteDded {Q 

mhibitors include but are not limited to antisense molecules, ^ Furthermore, the nucleotide sequences disclosed 

antibodies antagonists and their denva tives. * m olecular biology techniques that 

A«standard" is a quantitative or ^^^^ y 40 have not yet been developed, provided the new techniques 

for comparison Preferably, it is based on a staUsticdly * nucleotide sequences that are currently 

appropriate number of samples and is created to use as a ^ * bm nQt limited H to such ties as the 

basis of comparison when j^rforming diagnosUc assays, ^ ^ c ^ imeracdonSt 

running clinical tnals, or following patient treatment pro- r * r r 

files. The samples of a particular standard may be normal or 45 As a result of the degeneracy of the genetic code a 

similarly abnormal multitude of kinase-encoding nucleotide sequences may be 

"Animal" as used herein may be defined to include produced and some of these will bear only minimal homol- 

human, domestic (cats, dogs, etc), agricultural (cows, ogy to the endogenous sequence of any known and naturaUy 

horses, sheep, goats, chicken, fish, etc) or test species (frogs, occurring kinase. This invention has specifically contem- 

mice, rats, rabbits, simians, etc). 50 plated each and every possible variation of nucleotide 

"Disorders or diseases" in which altered kinase activity sequence that could be made by selecting combinations 

have been implicated specifically include, but are not limited based on possible codon choices. These combinations/are 

to, human X-linked agammaglobulinemia, nonspherocytic made in accordance with the standard triplet genetic code as 

hemolytic anemia, atherosclerosis, carcinomas (breast, applied to the nucleotide sequence of naturally occurring 

ovary, renal, squamous cell and prostate), diabetes, gliomas, 55 kinases, and all such variations are to be considered as being 

glomerular disease, hepatomegaly, Karposi's sarcoma, lym- specifically disclosed. 

phoblastic and myelogenous leukemias, myoglobinuria, Although the kinase nucleotide sequences and their 

peptic ulcer disease, psoriasis, pulmonary fibrosis, derivatives or variants are preferably capable of identifying 

restenosis, and septic shock due to cholera, Clostridium the nucleotide sequence of the naturally occurring kinase 

difficile, E. coli and Shigella. 60 under optimized conditions, it may be advantageous to 

Since the list of technical and scientific terms cannot be all produce kinase-encoding nucleotide sequences possessing a 

encompassing, any undefined terms shall be construed to substantially different codon usage. Codons can be selected 

have the same meaning as is commonly understood by one to increase the rate at which expression of the peptide occurs 

of skill in the art to which this invention belongs. in a particular prokaryotic or eukaryotic expression host in 

Furthermore, the singular forms "a", "an" and "the" include 65 accordance with the frequency with which particular codons 

plural referents unless the context clearly dictates otherwise. are utilized by the host. Other reasons for substantially 

For example, reference to a "restriction enzyme" or a "high altering the nucleotide sequence encoding the kinase without 
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altering the encoded amino acid sequence include the pro- 
duction of RNA transcripts having more desirable 
properties, such as a longer half-life, than transcripts pro- 
duced from the naturally occurring sequence. 

Nucleotide sequences encoding a kinase may be joined to 5 
a variety of other nucleotide sequences by means of well 
established recombinant DNA techniques (Sambrook J. et al 
(1989) Molecular Cloning: A Laboratory Manual, Cold 
Spring Harbor Laboratory, Cold Spring Harbor, N.Y.; or 
Ausubel F. M. et al (1989) Current Protocols in Molecular 10 
Biology, John Wiley & Sons, New York City). Useful 
sequences for joining to the kinase include an assortment of 
cloning vectors such as plasmids, cosmids, lambda phage 
derivatives, phagemids, and the like. Vectors of interest 
include vectors for replication, expression, probe generation, 15 
sequencing, and the like. In general, vectors of interest may 
contain an origin of replication functional in at least one 
organism, convenient restriction endonuclease sensitive 
sites, and selectable markers for one or more host cell 
systems. 20 

PCR as described in U.S. Pat. Nos. 4,683,195; 4,800,195; 
and 4,965,188 provides additional uses for oligonucleotides 
based upon the kinase nucleotide sequence. Such oligomers 
are generally chemically synthesized, but they may be of ^ 
recombinant origin or a mixture of both. Oligomers gener- 
ally comprise two nucleotide sequences, one with sense 
orientation (S'-O') and one with antisense (3* to 5') 
employed under optimized conditions for identification of a 
specific gene or diagnostic use. The same two oligomers, ^ 
nested sets of oligomers, or even a degenerate pool of 
oligomers may be employed under less stringent conditions 
for identification and/or quantitation of closely related DNA 
or RNA sequences. 

Full length genes may be cloned utilizing partial nucle- 35 
otide sequence and various methods known in the art. 
Gobinda et al (1993; PCR Methods Applic 2:318-22) dis- 
close "restriction-site PCR" as a direct method which uses 
universal primers to retrieve unknown sequence adjacent to 
a known locus. First, genomic DNA is amplified in the ^ 
presence of primer to linker and a primer specific to the 
known region. The amplified sequences are subjected to a 
second round of PCR with the same linker primer and 
another specific primer internal to the first one. Products of 
each round of PCR are transcribed with an appropriate RNA 45 
polymerase and sequenced using reverse transcriptase. 
Gobinda et al present data concerning Factor IX for which 
they identified a conserved stretch of 20 nucleotides in the 
3' noncoding region of the gene. 

Inverse PCR is the first method to report successful 50 
acquisition of unknown sequences starting with primers 
based on a known region (Triglia.T. et al (1988) Nucleic 
Acids Res 16:8186). The method uses several restriction 
enzymes to generate a suitable fragment in the known region 
of a gene. The fragment is then circularized by intramolecu- 55 
lar ligation and used as a PCR template. Divergent primers 
are designed from the known region. The multiple rounds of 
restriction enzyme digestions and ligations that are neces- 
sary prior to PCR make the procedure slow and expensive 
(Gobinda et al, supra). 60 

Capture PCR (Lagerstrom M. et al (1991) PCR Methods 
Applic 1:111—19) is a method for PCR amplification of DNA 
fragments adjacent to a known sequence in human and YAC 
DNA. As noted by Gobinda et al (supra), capture PCR also 
requires multiple restriction enzyme digestions and ligations 65 
to place an engineered double-stranded sequence into an 
unknown portion of the DNA molecule before PCR. 
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Although the restriction and ligation reactions are carried 
out simultaneously, the requirements for extension, immo- 
bilization and two rounds of PCR and purification prior to 
sequencing render the method cumbersome and time con- 
suming. 

. Parker J. D. et al (1991; Nucleic Acids Res 19:3055-60), 
teach walking PCR, a method for targeted gene walking 
which permits retrieval of unknown sequence. Promoter- 
Finder™ is a new kit available from Clontech (Palo Alto, 
Calif.) which uses PCR and primers derived from p53 to 
walk in genomic DNA. Nested primers and special Promot- 
erFinder libraries are used to detect upstream sequences 
such as promoters and regulatory elements. This process 
avoids the need to screen libraries and is useful in finding 
intron/exon junctions. 

Another new PCR method, "Improved Method for 
Obtaining Full Length cDNA Sequences" by Guegler et al, 
patent application Sen No 08/487,112, filed Jun. 7, 1995 and 
hereby incorporated by reference, employs XL-PCR 
(Perkin-Elmer, Foster City, Calif.) to amplify and extend 
partial nucleotide sequence into longer pieces of DNA. This 
method was developed to allow a single researcher to 
process multiple genes (up to 20 or more) at one time and to 
obtain an extended (possibly full-length) sequence within 
6-10 days. This new method replaces methods which use 
labelled probes to screen plasmid libraries and allow one 
researcher to process only about 3-5 genes in 14-40 days. 

In the first step, which can be performed in about two 
days, any two of a plurality of primers are designed and 
synthesized based on a known partial sequence. In step 2, 
which takes about six to eight hours, the sequence is 
extended by PCR amplification of a selected library. Steps 3 
and 4, which take about one day, are purification of the 
amplified cDNA and its ligation into an appropriate vector. 
Step 5, which takes about one day, involves transforming 
and growing up host bacteria. In step 6, which takes approxi- 
mately five hours, PCR is used to screen bacterial clones for 
extended sequence. The final steps, which take about one 
day, involve the preparation and sequencing of selected 
clones. 

If the full length cDNA has not been obtained, the entire 
procedure is repeated using either the original library or 
some other preferred library. The preferred library may be 
one that has been size-selected to include only larger cDNAs 
or may consist of single or combined commercially avail- 
able libraries, eg. lung, liver, heart and brain from Gibco/ 
BRL(Gaithersburg, Md.). The cDNA library may have been 
prepared with oligo (dT) or random priming. Random 
primed libraries are preferred in that they will contain more 
sequences which contain 5' ends of genes. A randomly 
primed library may be particularly useful if an oligo (dT) 
library does not yield a complete gene. It must be noted that 
the larger and more complex the protein, the less likely it is 
that the complete gene will be found in a single plasmid. 

A new method for analyzing either the size or the nucle- 
otide sequence of PCR products is capillary electrophoresis. 
Systems for rapid sequencing are available from Perkin 
Elmer (Foster, City Calif.), Beckman Instruments (Fullerton, 
Calif.), and other companies. Capillary sequencing employs 
flowable polymers for electrophoretic separation, four dif- 
ferent fluorescent dyes (one for each nucleotide) which are 
laser activated, and detection of the emitted wavelengths by 
a charge coupled devise camera. Output/light intensity is 
converted to electrical signal using appropriate software (eg. 
Genotyper™ and Sequence Navigators™ from Perkin 
Elmer) and the entire process from loading of samples to 
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computer analysis and electronic data display is computer particular therapeutic treatment regime. It may be used in 
controlled. Capillary electrophoresis provides greater reso- animal studies, in clinical trials, or in monitoring the treat- 
lution and is many times faster than standard gel based ment of an individual patient. First, standard expression 
procedures. It is particularly suited to the sequencing of must be established for use as a basis of comparison, 
small pieces of DNA which might be present in limited 5 Second, samples from the animals or patients affected by the 
amounts in a particular sample. The reproducible scqucnc- disorder or disease are combined with the nucleotide 
ing of up to 350 bp of M13 phage DNA in 30 min has been sequence to evaluate the deviation from the standard or 
reported (Ruiz-Martinez M. C. et al (1993) Anal Chem normal profile. Third, an existing therapeutic agent is 
65:2851-8). administered, and a treatment profile is generated. The assay 
Another aspect of the subject invention is to provide for 3Q is evaluated to determine whether the profile progresses 
kinase hybridization probes which are capable of hybridiz- toward or returns to the standard pattern. Successive treat- 
ing with naturally occurring nucleotide sequences encoding ment profiles may be used to show the efficacy of treatment 
kinases. The stringency of the hybridization conditions will over a period of several days or several months, 
determine whether the probe identifies only the native The nucleotide sequence for any particular kinase (SEQ 
nucleotide sequence of that specific kinase or sequences of jrj> nqs 1-45) can also be used to generate probes for 
closely related molecules. If degenerate kinase nucleotide map pj n g the native genomic sequence. The sequence may be 
sequences of the subject invention are used for the detection d tQ fl dcular chr0 mosome or to a specific region 
ofrelatedkmaseencodmg sequences, they should preferably Qf ^ chromosome ^ well ^owi techniques. These 
contam at least 50% of the nucleotides of the sequences hybridization to chromosomal spreads 
presented herein. ^HybndizaUon probes of the subject inven- * Huffian Chromosomes; A Maimal of 
tion may be derived from the nucleotide sequences oi the 20 v . v / 

SEQ ID NOs 1-44, or from surrounding or included Basic -Techniques, Pergamon Press, New Jork City), flow- 
genomic sequences comprising untranslated regions such as chromosomal preparations, or artificial chrornosome 
promoters, enhancers and introns. Such hybridization probes constructions such as yeast artificial chromosomes (YACs), 
may be labelled with appropriate reporter molecules. Means bacterial artificial chromosomes (BACs), bacterial PI con- 
for producing specific hybridization probes for kinases 2 5 struc tions or single chromosome cDNA libraries, 
include oligolabelling, nick translation, end-labelling or In situ hybridization of chromosomal preparations and 
PCR amplification using a labelled nucleotide. Alternatively, physical mapping techniques such as linkage analysis using 
the cDNA sequence may be cloned into a vector for the established chromosomal markers are invaluable in extend- 
production of mRNA probe. Such vectors are known in the ing genetic maps. Examples of genetic maps can be found in 
art, are commercially available, and maybe used to synthe- 30 the 1994 Genome Issue of Science (265:1981f). Often the 
size RNA probes in vitro by addition of an appropriate RNA placement of a gene on the chromosome of another mam- 
polymerase such asT7, T3 or SP6 and labelled nucleotides. malian species may reveal associated markers even if the 
A. number of companies (such as Pharmacia Biotech, number or arm of a particular human chromosome is not 
Piscataway, N.J.; Promega, Madison, Wis.; US Biochemical known; New partial nucleotide sequences can be assigned to 
Corp, Cleveland, Ohio; etc.) supply commercial kits and 35 chromosomal arms, or parts thereof, by physical mapping, 
protocols for these procedures. This provides valuable information to investigators search- 
It is also possible to produce a DNA sequence, or portions ing for disease genes using positional cloning or other gene 
thereof, entirely by synthetic chemistry. Sometimes the discovery techniques. Once a disease or syndrome, such as 
source of information for producing this sequence comes ataxia telangiectasia (AI), has been crudely localized by 
from the known homologous sequence from closely related 40 genetic linkage to a particular genomic region, for example, 
organisms. After synthesis, the nucleic acid sequence can be AT to llq22-23 (Gatti et al (1988) Nature 336:577-580), 
used alone or joined with a preexisting sequence and any sequences mapping to that area may represent genes for 
inserted into one of the many available DNA vectors and further investigation. The nucleotide sequences of the sub- 
their respective host cells using techniques well known in ject invention may also be used to detect differences in the 
the art. Moreover, synthetic chemistry may be used to 45 chromosomal location of nucleotide sequences due to 
introduce specific mutations into the nucleotide sequence. translocation, inversion, etc. between normal and carrier or 
Alternatively, a portion of sequence in which a mutation is affected individuals. 

desired can be synthesized and recombined with a portion of The partial nucleotide sequence encoding a particular 

an existing genomic or recombinant sequence. kinase may be used to produce an amino acid sequence using 

The kinase nucleotide sequences can be used individually, 50 well known methods of recombinant DNA technology, 

or in panels, in a diagnostic test or assay to detect disorder Goeddel (1990, Gene Expression Technology, Methods and 

or disease processes associated with abnormal levels of Enzymology, Vol 185, Academic Press, San Diego, Calif.) is 

kinase expression. The nucleotide sequence is added to a one among many publications which teach expression of an 

sample (fluid, cell or tissue) from a patient under hybridizing isolated, purified nucleotide sequence. The amino acid or 

conditions. After an incubation period, the sample is washed 55 peptide may be expressed in a variety of host cells, either 

with a compatible fluid which optionally contains a reporter prokaryotic or eukaryotic. Host cells may be from the same 

molecule which will bind the specific nucleotide. After the species from which the nucleotide sequence was derived or 

compatible fluid is rinsed off, the reporter molecule is from a different species. Advantages of producing an amino 

quantitated and compared with a standard for that fluid, cell acid sequence or peptide by recombinant DNA technology 

or tissue. If kinase expression is significantly different from 60 include obtaining adequate amounts for purification and the 

the standard, the assay indicates the presence of disorder or availability of simplified purification procedures, 

disease. The form of such qualitative or quantitative meth- Cells transformed with a kinase nucleotide sequence may 

ods may include northern analysis, dot blot or other mem- be cultured under conditions suitable for the expression and 

brane based technologies, dip stick, pin or chip technologies, recovery of peptide from cell culture. The peptide produced 

PCR, ELISAs or other multiple sample format technologies. 65 by a recombinant cell may be secreted or may be contained 

This same assay, combining a sample with the nucleotide intracellularly depending on the sequence itself and/or the 

sequence, is applicable in evaluating the efficacy of a vector used. In general, it is more convenient to prepare 
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recombinant proteins in secreted form, and this is accom- an older procedure, the procedure presented in this applica- 

plished by ligating kin to a recombinant nucleotide sequence tion is exemplary of one currently being used by persons 

which directs its movement through a particular prokaryotic skilled in the art. For the purpose of providing an exemplary 

or eukaryotic cell membrane. Other recombinant construe- method, the tissue preparation, mRNA isolation and cDNA 

tions may join kin to nucleotide sequence encoding a 5 library construction described here is for the rheumatoid 

polypeptide domain which will facilitate protein purification synovium library from which the Incyte Clones 191283 and 

(Kroll D. J. et al (1993) DNA Cell Biol 12:441-53). 192268 for ser/thr kinases were obtained. 

Direct peptide synthesis using solid-phase techniques Rheumatoid synovial tissue was obtained from the hip 

(Stewart et al (1969) Solid-Phase Peptide Synthesis, WH joint removed from a 68 year old female with erosive, 

Freeman Co, San Francisco, Calif.; Merrifield J. (1963) J ™ nodular rheumatoid arthritis. The tissue was frozen, ground 

Am Chem Soc 85:2149-2154) is an alternative to recom- to powder in a mortar and pestle, and lysed immediately in 

binant or chimeric peptide production. Automated synthesis buffer containing guanidimum isothiocyanate. The lysate 

may be achieved, for example, using Applied Biosystems was centrifuged over a CsCl cushion (18 hrs at 25,000 rpm 

431A Peptide Synthesizer in accordance with the instruc- using a Beckman SW28 rotor and ultracentrifuge; Beckman 

tions provided by the manufacturer. Additionally a particular 15 Instruments, Palo Alto, Calif.), ethanol precipitated, resus- 

kinase sequence or any part thereof may be mutated during pended in water and DNase treated for 15 mm at 37 C. The 

direct synthesis and combined using chemical methods with RNAwas extracted with phenol chloroform and precipitated 

other kinase sequences) or a part thereof. This chimeric with ethanol. Polyadenylated messages were isolated using 

nucleotide sequence can also be placed in an appropriate Qiagen Oligotex (QIAGEN Inc, Chatsworth, Calif.), and a 

vector and host cell to produce a variant peptide. 20 custom- cDNA library was constructed by Stratagene (La 

Although an amino acid sequence or oligopeptide used for Jolla, Calif.), 

antibody induction does not requite biological activity, it , Flrs * S iL and . cDN * T * acCOm P 1 ! she , d ™ 

must be immunogenic. KIN used to induce specific anti- ol 'g° ( dT ) pnmer/hnker which also contained an Xhol 

bodies may have an amino acid sequence consisting of at "?ncuon **• Seco f n ^ and , synthesis was performed 

least five amino acids and preferably at least 10 amino acids. » a «>**ui of ( D NApolymerase I, E.coh hgase and 

Short stretches of amino acid sequence may be fused with ™f> ^^^'f^?vf M S^r^ 

thoseofanotherproteinsuchaskeyholelimrithemocyanin, ^ ' nded cDNA " ™< ^"J^* 4 d °"- ^ 

and the chimeric peptide used for antibody production. cDNA was then digested with Xhol restriction enzyme, 

Alternatively, the ohgopeptide may be of sufficient length to ™ °™« and fr ? C ( tl0 ° ated bv e 

contain an entire domain 6 30 on Sephacryl S400. DNA of the appropriate size was then 

. ., ,. .„ r . . , . . Heated to dephosphorylated Lambda Zap® arms 

Antibodies specific for KIN may be produced by bdcu- (; f tratagene) and p * ckag6d using Gigapack extracts 

lation of an appropriate animal with an antigenic fragment of ^ ^ Bluescri t . (Stratag6n6) phag6mid DNAs 

the peptide. An antibody is specific for HNrf it is produced ^ £ ^ ^ ^ ^ 

against an epitope of the polypeptide and binds to at least ^ ^ ^ altemati DNAs were purified using Miniprep 

part of the natural or recombinant protein. Antibody pro- ^ #7746g . Advanced Genetic Techno l 0 gies 

duction includes not only the stimulation of an immune c a(io Gaithersburg) M d.). These kits provide a 

response by injection into animals, but also analogous % ^ format and fa ^ for %Q 

processes such as the production of synthetic antibodies, the ^ QCol ^ ^ each ^ has 

screening of recombinant immunoglobuhn hbranes ; for for ^ followin cfa ^ 96 

SFS^SS? m0l T lCS l?"n ? • V n£Tl weus are each filled with only 1 ml of sterile Terrific broth 

2« ^ ° f u"? • , ( V t Kn ? (LIFE TECHNOLGIES™, Gaithersburg, Md.) with carbe- 

256:1275-1281), or the in vitro stimulation of lymphocyte >t ^ (2xCarb) ^ ^ j at q.4%. After the 

ffiS if Kr DS - < ^^TJ£P ?■ " k m , wells are inoculated, the bacteria are cultured for 24 hours 

(1991) Nature 349:293-299) provides for a number of 45 ^ ^ 6Q rf ^ A ^^^g,,;^ ^ 

highly specific binding reagents based on the principles of for 5 fa rformed ^fort the contents 

antibody formation. These techniques ( may be adapted to ^ ^ ^ m ^ ^ ^ ^ ^ ^ ^ 

produce molecules which specifically bind kinase peptides. ^ q{ a aopropariol t0 TRIS buffer is not 

Anubodies or other appropriate molecules generated against ^ performed. After the last step in the protocol, 

a specific immunogenic peptide fragment or ■oligopeptide $q ^ ik tilmskmd to a Beckman 96 . well block for 

can be used in Western analysis, enzyme-linked immunosor- stora „ e 

bentassays(ELISA)orsimUartestetoestablishthepresence n Sequencing of cDNA Clones 

of or to quantitate amounts of kinase active in normal, ^ cDNAinserts ^ random of the rheumatoid 

diseased, or therapeutically treated cells or tissues. synovi um or other appropriate library were sequenced in 

The examples below are provided to illustrate the subject J5 paft Melhods for DNAsequencing are well known in the art 

invention. These examples are provided by way of illustra- and 6mploy such enzymes as the Klenow fragment of DNA 

tion and are not included for the purpose of limiting the polymerase I, SEQUENASE® (US Biochemical Corp) or 

invention. Taq polymerase. Methods to extend the DNA from an 

EXAMPLES oligonucleotide primer annealed to the DNA template of 

I cDNA Library Construction 60 interest have been developed for both single- and double- 

The kinase sequences of this application (Table 1) were stranded templates. Chain termination reaction products 

first identified among the sequences comprising various were separated using electrophoresis and detected via their 

libraries. Technology has advanced considerably since the incorporated, labelled precursors. Recent improvements in 

first cDNA libraries were made. Many small variations in mechanized reaction preparation, sequencing and analysis 

both chemicals and machinery have been instituted over 65 have permitted expansion in the number of sequences that 

time, and these have improved both the efficiency and safety can be determined per day. Preferably, the process is auto- 

of the process. Although the cDNAs could be obtained using mated with machines such as the Hamilton Micro Lab 2200 
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(Hamilton, Reno, Nev.), Peltier Thermal Cycler (PTC200; 
MJ Research, Watertown Mass.) and the Applied Biosys- 
tems Catalyst 800 and 377 and 373 DNA sequencers. 

The quality of any particular cDNA library may be 
determined by performing a pilot scale analysis of 192 5 
cDNAs and checking for percentages of clones containing 
vector, lambda or E, coli DNA, mitochondrial or repetitive 
DNA, and clones with exact or homologous matches to 
public databases. The number of unique sequences — those 
having no known match in any available database — were 10 
recorded. 

Ill Homology Searching of cDNA Clones and Their 
Deduced Proteins 

Each sequence so obtained was compared to sequences in 
GenBank using a search algorithm developed by Applied 
Biosystems and incorporated into the INHERIT™ 670 
Sequence Analysis System. In this algorithm, Pattern Speci- 
fication Language (TOW Inc, Los Angeles, Calif.) was used 
to determine regions of homology. The three parameters that 
determine how the sequence comparisons run were window 
size, window offset, and error tolerance. Using a combina- 
tion of these three parameters, the DNA database was 
searched for sequences containing regions of homology to 
the query sequence, and the appropriate sequences were 
scored with an initial value. Subsequently, these homolo- 
gous regions were examined using dot matrix homology 
plots to distinguish regions of homology from chance 
matches. Smith -Waterman alignments were used to display 
the results of the homology search. 

Peptide and protein sequence homologies were ascer- 
tained using the INHERIT™ 670 Sequence Analysis System 
in a way similar to that used in DNA sequence homologies. 
Pattern Specification Language and parameter windows 
were used to search protein databases for sequences con- 
taining regions of homology which were scored with an 
initial value. Dot-matrix homology plots were examined to 
distinguish regions of significant homology from chance 
matches. 

Alternatively, BLAST, which stands for Basic Local 
Alignment Search Tool, is used to search for local sequence 
alignments (Altschul S. F. (1993) J Mol Evol 36:290-300; 
Altschul, S. F. et al (1990) J Mol Biol 215:403-10). BLAST 
produces alignments of both nucleotide and amino acid 
sequences to determine sequence similarity. Because of the 
local nature of the alignments, BLAST is especially useful 
in determining exact matches or in identifying homologs. 
While it is useful for matches which do not contain gaps, it 
is inappropriate for performing motif-style searching. The 
fundamental unit of BLAST algorithm output is the High- 
scoring Segment Pair (HSP). 

An HSP consists of two sequence fragments of arbitrary 
but equal lengths whose alignment is locally maximal and 
for which the alignmentBLAST approach is to look thresh- 
old or cutoff score set by the user. The BLAST approach is 
to look for HSPs between a query sequence and a database 
sequence, to evaluate the statistical significance of any 
matches found, and to report only those matches which 
satisfy the user-selected threshold of significance. The 
parameter E establishes the statistically significant threshold 
for reporting database sequence matches. E is interpreted as 
the upper bound of the expected frequency of chance 
occurrence of an HSP (or set of HSPs) within the context of 
the entire database search. Any database sequence whose 
match satisfies E is reported in the program output. 

All the kinase molecules presented in this application 
were examined using INHERIT. Although their identifica- 
tion was based on the criteria above, their homology to 
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known kinase molecules and name are subject to change 
when additional computer analysis against additional or 
more recent database information is employed. For example, 
whereas the first two kinases in Table 1 were initially 
identified as unique Incyte clones, homologous mouse and 
human kinases are now known. In other cases, additional 
sequence information has become available and its review 
against the known databases has precipitated a name change. 
Occasionally a clone number will also disappear from the 
LIFESEQ™ database (Incyte Pharmaceuticals Inc, Palo 
Alto, Calif.). This situation generally arises during the 
regular review of clones and assembly of contiguous 
sequences. 

IV Extension of cDNAs to Full Length 

The kinase sequences presented here can be used to 
design oligonucleotide primers for the extension of the 
cDNAs to full length. In fact, the partial map kinase cDNA 
sequence (SEQ ID NO 38) initially identified in Incyte clone 
214915 among the sequences comprising the human stom- 
ach cell library was extended to full length as shown in "A 
Novel Human Map Kinase Homolog" by Hawkins et al. 
Incyte Docket PF-036P, filed on Jun. 28, 1995, incorporated 
herein by reference. The coding region of this full length 
sequence (SEQ ID NO 45; Incyte Clone 214915E) begins at 
nucleotide 58 and ends at nucleotide 1156. 

Primers are designed based on known sequence; one 
primer is synthesized to initiate extension in the antisense 
direction (XLR) and the other to extend sequence in the 
sense direction (XLF). The primers allow the sequence to be 
extended "outward" generating amplicons containing new, 
unknown nucleotide sequence for the gene of interest. The 
primers may be designed using Oligo 4.0 (National Bio- 
sciences Inc, Plymouth, Minn.), or another appropriate 
program, to be 22-30 nucleotides in length, to have a GC 
content of 50% or more, and to anneal to the target sequence 
at temperatures about 68°-72° C. Any stretch of nucleotides 
which would result in hairpin structures and primer-primer 
dimerizations was avoided. 

The stomach cDNA library was used as a template, and 
XLR-AAG ACA TCC AGG AGC CCA ATG AC and 
XLF-AGG TGA TCC TCA GCT GGA TGC AC primers 
were used to extend and amplify the 214915 sequence. By 
following the instructions for the XL-PCR kit and- thor- 
oughly mixing the enzyme and reaction mix, high fidelity 
amplification is obtained. Beginning with 25 pMol of each 
primer and the recommended concentrations of all other 
components of the kit, PCR is performed using the Peltier 
Thermal Cycler (PTC200; MJ Research, Watertown, Mass.) 
and the following parameters: 

Step 1 94° C. for 60 sec (initial denaturation) 

Step 2 94° C. for 15 sec 

Step 3 65° C. for 1 min 

Step 4 68° C. for 7 min 

Step 5 Repeat step 2-4 for 15 additional cycles 

Step 6 94° C. for 15 sec 

Step 7 65° C. for 1 min 

Step 8 68° C. for 7 min+15 sec/cycle 

Step 9 Repeat step 6-8 for 11 additional cycles 

Step 10 72° C. for 8 min 

Step 11 4° C. (and holding) 

At the end of 28 cycles, 50 /*1 of the reaction mix was 
removed; and the remaining reaction mix was run for an 
additional 10 cycles as outlined below: 

Step 1 94° C. for 15 sec 

Step 2 65° C. for 1 min 
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Step 3 68° C. for (10 min+15 sec)/cycle 
Step 4 Repeat step 1-3 for 9 additional cycles 
Step 5 72° C. for 10 min 

A 5-10 /d aliquot of the reaction mixture is analyzed by 
electrophoresis on a low concentration (about 0.6-0.8%) 
agarose mini-gel to determine which reactions were suc- 
cessful in extending the sequence. Although all extensions 
potentially contain a full length gene, some of the largest 
products or bands are selected and cut out of the gel. Further 
purification involves using a commercial gel extraction 
method such as QIAQuick™ (Q1AGEN Inc). After recovery 
of the DNA, Klenow enzyme is used to trim single-stranded, 
nucleotide overhangs creating blunt ends which facilitate 
religation and cloning. 

After ethanol precipitation, the products are redissolved in 
13 fA of ligation buffer. Then, 1 /il T4-DNA ligase (15 units) 
and 1 jul T4 polynucleotide kinase are added, and the mixture 
is incubated at room temperature for 2-3 hours or overnight 
at 16° C Competent E. coli cells (in 40 /d of appropriate 
media) are transformed with 3 /d of ligation mixture and 
cultured in 80^1 of SOC medium (Sambrook J. et al, supra). 
After incubation for one hour at 37° C, the whole transfor- 
mation mixture is plated on Luria Bertani (LB)-agar 
(Sambrook J. et al, supra) containing 2xCarb. The following 
day, 12 colonies are randomly picked from each plate and 
cultured in 150 /tl of liquid LB/2xCarb medium placed in an 
individual well of an appropriate, commercially-available, 
sterile 96- well microti ter plate. The following day, 5 /d of 
each overnight culture is transferred into a non-sterile 
96-well plate and after dilution 1:10 with water, 5 of each 
sample is transferred into a PCR array. 

For PCR amplification, 15 fil of concentrated PCR reac- 
tion mix (1.33x) containing 0.75 units of Taq polymerase, a 
vector primer and one or both of the gene specific primers 
used for the extension reaction are added to each well. 
Amplification is performed using the following conditions: 

Step 1 94° C. for 60 sec 

Step 2 94° C. for 20 sec 

Step 3 55° C. for 30 sec 

Step 4 72° C. for 90 sec 

Step 5 Repeat steps 2-4 for an additional 29 cycles 
Step 6 72° C. for 180 sec 
Step 7 4° C. (and holding) 

Aliquots of the PCR reactions are run on agarose gels 
together with molecular weight markers. The sizes of the 
PCR products are compared to the original partial cDNAs, 
and appropriate clones are selected, ligated into plasmid and 
sequenced. 

V Diagnostic Assays Using Kinase Specific Oligomers 

In those cases where a specific disorder or disease (see 
definitions supra) is suspected to involve altered quantities 
of a particular kinase, oligomers may be designed to estab- 
lish the presence and/or quantity of mRNA expressed in a 
biological sample. There are several methods currently 
being used to quantitate the expression of a particular 
molecule. Most of these methods use radiolabeled (Melby 
P. C. et al 1993 J Immunol Methods 159:235-44) or bioti- 
nylated (Duplaa C. et al 1993 Anal Biochem 229-36) 
nucleotides, coamplification of a control nucleic acid, and 
standard curves onto which the experimental results are 
interpolated. For example, phosphorylase B kinase defi- 
ciency may manifest as hepatomegaly which is inherited as 
either an X-linked or autosomal recessive trait or myoglo- 
binuria whose inheritance is unknown. 

Oligomers for phosphorylase B kinase are first used in 
quantitative PCR to establish a normal range for expression 
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of phosphorylase B kinase. Then, these same oligomers are 
used with extracts of cells from patients with inherited 
phosphorylase B kinase deficiency. The information from 
such studies is used to define different inheritance patterns 
5 and to diagnose future patients displaying phosphorylase B 
kinase deficiency-like symptoms. In like manner, this same 
assay can be used to monitor progress of the patient as 
his/her physiological situation moves toward the normal 
range during therapy for the condition. 

VI Kinases Kit 

The kinases of the subject invention are used to produce 
a kinases kit for diagnosing disorders or diseases associated 
with altered kinase expression. This involves the designing 
a plurality of oligomers, one set of which is specific for each 
kinase or kinase regulatory sequence. Specificity in this case 

15 refers to sequence similarity, to the length of the nucleic acid 
molecule amplified, to cell or tissue type being screened or 
to the disorder or disease. These oligomers are combined 
with a biological sample obtained from a patient in a 
solution sufficient for PCR and amplified. The PCR products 

20 are examined first, to detect the expression of each kinase, 
and second to quantify the expression of each kinase. Kinase 
expression is compared with standard ranges for normal and 
abnormal expression. In the case(s) where kinase expression 
is altered, use of the kit has provided the physician with a 

25 named disorder or disease which can be treated or further 
investigated. 

A further use of the oligomers from the kinases kit is in 
a diagnostic assay of example V (above) used to monitor 
patient response to drug therapy. Once the disease has been 

30 named and a therapy chosen, the oligomers specific to the 
patient's disease may be used periodically to monitor the 
efficacy of the chosen therapy. In this case, the specific 
oligomers are combined with a biological sample from the 
patient in a solution sufficient for PCR and amplified. The 

35 PCR product is quantified and compared with a normal 
standard and with the pretreatment profile of the patient. If 
the kinase expression is tending toward normal, the therapy 
may be considered effective; if the expression is even more 
abnormal, therapy should be discontinued and an alternative 

40 treatment instituted. 

VII Sense or Antisense Molecules 

Knowledge of the correct cDNA sequence of any particu- 
lar kinase, its regulatory elements or parts thereof will 
enable its use as a tool in sense (Youssoufian H. and H. F. 

45 Lodish 1993) Mol Cell Biol 13:98-104) or antisense 
(Eguchi et al (1991) Annu Rev Biochem 60:631-652) tech- 
nologies for the investigation of gene function. 
Oligonucleotides, from genomic or cDNAs, comprising 
either the sense or the antisense strand of the cDNA 

50 sequence can be used in vitro or in vivo to inhibit expression. 
Such technology is now well known in the art, and oligo- 
nucleotides or other fragments can be designed from various 
locations along the sequences. 
The gene of interest can be turned off in the short term by 

55 transfecting a cell or tissue with expression vectors which 
will flood the cell with sense or antisense sequences until all 
copies of the vector are disabled by endogenous nucleases. 
Stable transfection of appropriate germ line cells or prefer- 
ably a zygote with a vector containing the fragment will 

60 produce a transgenic organism (U.S. Pat. No. 4,736,866, 12 
Apr. 1988), which produces enough copies of the sense or 
antisense sequence to significantly compromise or entirely 
eliminate normal activity of the particular kinase gene. 
Frequently, the function of the gene can be ascertained by 

65 observing behaviors such as lethality, loss of a physiological 
pathway, changes in morphology, etc. at the intracellular, 
cellular, tissue or organismal level. 
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In addition to using fragments constructed to interrupt of interest. They include M MTV, SV40, and metallothio nine 

transcription of the open reading frame, modifications of promoters for CHO cells; trp, lac, tac and T7 promoters for 

gene expression can be obtained by designing antisense bacterial hosts; and alpha factor, alcohol oxidase and PGH 

sequences to promoters, enhancers, introns, or even to promoters for yeast. In addition, transcription enhancers, 

trans-acting regulatory genes. Similarly, inhibition can be 5 such as the rous sarcoma virus (RSV) enhancer, may be used 

achieved using Hogeboom base-pairing methodology, also in mammalian host cells. Once homogeneous cultures of 

known as "triple helix" base pairing. recombinant cells are obtained through standard culture 

VIII Expression of Kinases methods, large quantities of recombinantly produced peptide 

Expression of the kinases may be accomplished by sub- can be recovered from the conditioned medium and ana- 
cloning the cDNAs into appropriate vectors and transfecting 10 lyzed using methods known in the art. 
the vectors into host cells. In some cases, the cloning vector IX Isolation of Recombinant KIN 

previously used for the generation of the tissue library also KIN may be expressed as a recombinant protein with one 

provides for direct expression of kinase sequences in E. coli. or more additional polypeptide domains added to facilitate 

Upstream of the cloning site, this vector contains a promoter protein purification. Such purification facilitating domains 

for 0-galactosidase, followed by sequence containing the 15 include, but are not limited to, metal chelating peptides such 

amino-terminal Met and the subsequent 7 residues of as histidine-tryptophan modules that allow purification on 

(3-galactosidase. Immediately following these eight residues immobilized metals, protein A domains that allow purifica- 

is a bacteriophage promoter useful for transcription and a tion on immobilized immunoglobulin, and the domain uti- 

linker containing a number of unique restriction sites. lized in the FLAGS extension/affinity purification system 

Induction of an isolated, transfected bacterial strain with 20 (Immunex Corp, Seattle, Wash.). The inclusion of a cleav- 

IPTG using standard methods will produce a fusion protein able linker sequence such as Factor XA or enterokinase 

corresponding to the first seven residues of p-galactosidase, (Invitrogen) between the purification domain and the kin 

about 5 to 15 residues which correspond to linker, and the sequence may be useful to facilitate expression of KIN. 

peptide encoded within the kinase cDNA. Since cDNA X Testing for Kinase Activity 

clone inserts are generated by an essentially random process, 25 The sequences in this application represent many different 

there is one chance in three that the included cDNA will lie domains of different kinase families. These domains (and 

in the correct frame for proper translation. If the cDNA is not subdomains as detailed in the background of the invention) 

in the proper reading frame, it can be obtained by deletion may be utilized: 1) individually for the production of 

or insertion of the appropriate number of bases by well antibodies, 2) in functional groups (eg. to span a membrane), 

known methods including in vitro mutagenesis, digestion 30 and 3) as interchangable, usable parts of a chimeric kinase, 

with exonuclease III or mung bean nuclease, or oligonucle- The various partial cDNA sequences of this application 

otide linker inclusion. represent the different kinase domains of the various fami- 

The kinase cDNA can be shuttled into other vectors lies (Hardie G. and Hanks S., supra), and they may be 

known to be useful for expression of protein in specific recombined in numerous ways to produce chimeric nucleic 

hosts. Oligonucleotide linkers containing cloning sites as 35 acid molecules. For example, a known, full length kinase 

well as a stretch of DNA sufficient to hybridize to the end of such as the human map kinase of this application (Seq ID No 

the target cDNA (25 bases) can be synthesized chemically 45) may be used to swap related portions of the nucleic acid 

by standard methods. These primers can then used to sequence, analogous to domains or subdomains of MAP 

amplify the desired gene fragments by PCR. The resulting kinase polypeptides. The chimeric nucleotides, so produced, 

fragments can be digested with appropriate restriction 40 may be introduced into prokaryotic host cells (as reviewed 

enzymes under standard conditions and isolated by gel in Strosberg A. D. and MarulloS. (1992) Trends PharmaSci 

electrophoresis. Alternatively, similar gene fragments can be 13: 95-98) or eukaryotic host cells. These host cells are then 

produced by digestion of the cDNA with appropriate restric- employed in procedures to determine what molecules acti- 

tion enzymes and filling in the missing gene sequence with vate the kinase or what molecules are activated by a kinase, 

chemically synthesized oligonucleotides. Partial nucleotide 45 Such activating or activated molecules may be of 

sequence from more than one gene can be ligated together extracellular, intracellular, biologic or chemical origin, 

and cloned in appropriate vectors to optimize expression. An example of a test system, in this case for protein 

Suitable expression hosts for such chimeric molecules tyrosine kinases, can be based on the interaction of protein 

include but are not limited to mammalian cells such as tyrosine kinases with chemokine receptors (Taniguchi T. 

Chinese Hamster Ovary (CHO) and human 293 cells, insect 50 (1995) Science 268:251-255). These receptors are capable 

cells such as Sf9 cells, yeast cells such as Saccharomyces of activating a variety of nonreceptor protein tyrosine 

cerevisiae, and bacteria such as E. coli. For each of these cell kinases when stimulated by an extracellular chemokine. 

systems, a useful expression vector may also include an C-X-C chemokines such as platelet factor 4, interleukin-8, 

origin of replication to allow propagation in bacteria and a connective tissue activating protein III, neutrophil activating 

selectable marker such as the 0-Iactamase antibiotic resis- 55 peptide 2, are soluble activators of neutrophils, 

tance gene to allow selection in bacteria. In addition, the A standard measure of neutrophil activation involves 

vectors may include a second selectable marker such as the measuring the mobilization of Ca"~" as part of the signal 

neomycin phosphotransferase gene to allow selection in transduction pathway. The experiment involves several 

transfected eukaryotic host cells. Vectors for use in eukary- steps. First, blood cells obtained from venipuncture are 

otic expression hosts may require RNA processing elements 60 fractionated by centrifugation on density gradients. Enriched 

such as 3' polyadenylation sequences if such are not part of populations of neutrophils are further fractionated on col- 

the cDNA of interest. umns by negative selection using antibodies specific for 

Additionally, some of the kinase vectors may contain other blood cells types. Next, neutrophils are transformed 

native promoters which will allow induction of gene expres- with an expression vector containing the kinase nucleic acid 

sion in human cells such as the 293 line mentioned above. 65 sequence of interest and preloaded fluorescent probe whose 

Other available promoters are host specific and may be emission characteristics have been altered by Ca** binding, 

specifically combined with the coding region of the kinase Or in the alternative, the neutrophil is preloaded with the 
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purified kinase of interest and fluorescent probe. Then, when affinities of at least 10 8 /M, preferably lO 9 to 10 10 or stronger, 

the cells are exposed to an appropriate chemokine, the will typically be made by standard procedures as described 

chemokine receptor activates the kinase which, in turn, in Harlow and Lane (1988) Antibodies: A Laboratory 

initiates Ca~ flux. Ca~ mobilization is observed and mea- Manual, Cold Spring Harbor Laboratory, Cold Spring 

sured using fluorometry as has been described in Grynk- 5 Harbor, N.Y.; and in Coding (1986) Monoclonal Antibodies: 

ieviczG. et al (1985) J Biol Chem 260:3440, and McColl S. Principles and Practice, Academic Press, New York City, 

et al (1993) J Immunol 150:4550-^555, incorporated herein both incorporated herein by reference, 

by reference XII Diagnostic Assays Using KIN Specific Antibodies 

XI Identification of or Production of Kinase Specific Anti- Particular KIN antibodies are useful for investigation of 

bodies io various disorders or diseases which may be charactenzed by 

Purified KIN is used to screen a pre-existing antibody differences in the amount or distribution of KIN. Given the 

library or to raise antibodies, using either polyclonal or usual role of the kinases, KIN might be expected to be 

monoclonal methodology. For polyclonal antibody upregulated (or downregulated) in its involvement in acti- 

production, denatured peptide from the reverse phase HPLC vation of signal cascades. 

separation is obtained in quantities up to 75 mg. This 15 Diagnostic assays for KIN include methods utilizing the 
denatured protein can be used to immunize mice or rabbits antibody and a reporter molecule to detect KIN in human 
using standard protocols; about 100 micrograms are body fluids, membranes, cells, tissues or extracts thereof, 
adequate for immunization of a mouse, while up to 1 mg The antibodies of the present invention may be used with or 
might be used to immunize a rabbit. In identifying mouse without modification. Frequently, the antibodies will be 
hybridomas, the denatured protein can be labelled and used 20 labelled by joining them, either covalently or noncovalently, 
to screen potential murine B-cell hybridomas for those with a substance which provides for a detectable signal. A 
which produce antibody. This procedure requires only small wide variety of reporter molecules and conjugation tech- 
quantities of protein, such that 20 mg would be sufficient for niques are known and have been reported extensively in 
labelling and screening of several thousand clones. both the scientific and patent literature. Suitable reporter 
For monoclonal antibody production, the amino acid 25 molecules or labels include those radionuclides, enzymes, 
sequence as deduced from translation of the cDNA, is fluorescent, chemi-luminescent, or chromogemc agents pre- 
analyzed'to determine regions of high immunogenicity. viously mentioned as well as substrates, cofactors, 
Peptides comprising appropriate hydrophilic regions are inhibitors, magnetic particles and the like. Patents teaching 
expressed from recombinant cDNAor synthesized and used the use of such labels include U.S. Pat. Nos. 3,817,837; 
in suitable immunization protocols to raise antibodies. 30 3,850,752; 3,939,350; 3,996,345; 4,277,437; 4,275,149; and 
Selection of appropriate epitopes is described by AusubelF. 4366,241. Also, recombinant immuno-globulins may be 
M. et al (supra). The optimal amino acid sequences for produced as shown in U.S. Pal. No. 4,816,567, incorporated 
immunization are usually located at the C-terminus or herein by reference. 

N-terminus and in intervening, hydrophilic regions of the A variety of protocols for measuring soluble or 

polypeptide which are likely to be exposed to the external 35 membrane-bound KIN, using either polyclonal or mono- 

environment when the protein is in its natural conformation. clonal antibodies specific for the protein, are known in the 

Typically selected oligopeptides, about 15 residues in art. Examples include enzyme-linked immunosorbent assay 
length, are synthesized using an Applied Biosystems Peptide (ELISA), radioimmunoassay (RIA) and fluorescent acti- 
Synthesizer Model 431Ausing frnoc-chemistry and coupled vated cell sorting (FACS). A two-site monoclonal-based 
to keyhole limpet hemocyanin (KLH, Sigma) by reaction 40 immunoassay utilizing monoclonal antibodies reactive to 
with M-maleimidobenzoyl-N-hydroxysuccinimide ester two non-interfering epitopes on KIN is preferred, but a 
(MBS; Ausubel F. M. et al, supra). If necessary, a cysteine competitive binding assay may be employed. These assays 
may be introduced at the N-terminus of the peptide to permit are described, among other places, tn Maddox, D. E. et al 
coupling to KLH. Rabbits are immunized with the peptide- (1983, J Exp Med 158:1211). 
KLH complex in complete Freund's adjuvant. The resulting 45 XIII Purification of Native KIN Using Antibodies 
antisera are tested for antipeptide activity by binding the Native or recombinant protein kinases can be purified by 
peptide to plastic, blocking with 1% bovine serum albumin, immunoaflBnity chromatography using antibodies specific 
reacting with antisera, washing and reacting with labelled, for that particular KIN. In general, an immunoaffinity col- 
affinity purified, specific goat anti-rabbit IgG. umn is constructed by covalently coupling the anti-KIN 

Hybridomas may also be prepared and screened using 50 antibody to an activated chromatographic resin, 

standard techniques. Hybridomas of interest are detected by Polyclonal immunoglobulins are prepared from immune 

screening with labelled KIN to identify those fusions pro- sera either by precipitation with ammonium sulfate or by 

during the monoclonal antibody with the desired specificity. purification on immobilized Protein A (Pharmacia Biotech). 

In a typical protocol, wells of plates (FAST; Becton- Likewise, monoclonal antibodies are prepared from mouse 
Dickinson Palo Alto, Calif.) are coated during incubation 55 ascites fluid by ammonium sulfate precipitation or chroma- 

with affinity purified, specific rabbit anti-mouse (or suitable tography on immobilized Protein A. Partially purified immu- 

anti-species Ig) antibodies at 10 mg/ml. The coated wells are noglobulin is covalently attached to a chromatographic resin 

blocked with 1% BSA, washed and incubated with super- such as CnBr-activated Sepharose (Pharmacia Biotech). The 

natants from hybridomas. After washing the wells are incu- antibody is coupled to the resin, the resin is blocked, and the 
bated with labelled KIN at 1 mg/ml. Supernatants with 60 derivative resin is washed according to the manufacturer s 

specific antibodies bind more labelled KIN than is detectable instructions. 

in the background. Then clones producing specific antibod- Such immunoaffinity columns may be utilized in the 

ies are expanded and subjected to two cycles of cloning at purification of KIN by preparing a fraction from cells 

limiting dilution. Cloned hybridomas are injected into containing KIN in a soluble form. This preparation may be 
pristane-treated mice to produce ascites, and monoclonal 65 derived by solubilization of whole cells or of a subcellular 

antibody is purified from mouse ascitic fluid by affinity fraction obtained via differential centrifugation (with or 

chromatography on Protein A. Monoclonal antibodies with without addition of detergent) or by other methods well 
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known in the art. Alternatively, soluble KIN containing a Labelled KIN fragments are also useful as a reagent for 

signal sequence may be secreted in useful quantity into the the purification of molecules with which KIN interacts, 

medium in which the cells are grown. specifically including . inhibitors In one embodiment of 

A soluble KIN-containing preparation is passed over the affinity purification, KIN is covalently coupled to a chro- 

immunoaffinity column, and the column is washed under 5 matography column. Cells and their membranes are 

conditions that allow the preferential absorbance of KIN (eg, extracted, KIN is removed and various pN-free subcom- 

high ionic strength buffers in the presence of detergent). ponents are passed over the column Molecu es bind to the 

Then, the column is eluted under conditions that disrupt column by virtue of their KIN affinity. The KIN-complex is 

antibody/KIN binding (eg, a buffer of pH 2-3 or a high recovered from the column, dissociated and the recovered 

concentration of a chaotrope such as urea or thiocyanate molecule is subjected to N-terminal protein sequencing, 

ion) and KIN is collected. 10 Th* amiD0 acid sequence is then used to identify the 

XIV Drug Screening captured molecule or to design degenerate oligomers for 
This invention is particularly useful for.screening thera- cloning its gene from an appropriate cDNA library. 

peutic compounds by using binding fragments of KIN in any In an alternate method, monoclonal antibodies raised 

of a variety of drug screening techniques. The molecules to against KIN fragments are screened to identify those which 

be screened may be of extracellular, intracellular, biologic or 15 inhibit the binding of labelled KIN. These monoclonal 

chemical origin. The peptide fragment employed in such a antibodies are then used in affinity purification or expression 

test may either be free in solution, affixed to a solid support, cloning of associated molecules. Other soluble binding 

borne on a cell surface or located intracellularly. One may molecules are identified in a similar manner. Labelled KIN 

measure, for example, the formation of complexes between is incubated with extracts or other appropriate materials 

KIN and the agent being tested. Alternatively, one can derived from rheumatoid synovium. After incubation, KIN 

examine the diminution in complex formation between KIN 20 complexes (which are larger than the lone KIN fragment) are 

and a receptor caused by the agent being tested. identified by a sizing technique such as size exclusion 

Methods of screening for drugs or any other agents which chromatography or density gradient centrifugation and are 

can affect signal transduction comprise contacting such an purified by methods known in the art. The soluble binding 

agent with KIN fragment and assaying for the presence of a protein(s) are subjected to N-terminal sequencing to obtain 

complex between the agent and the KIN fragment. In such 25 information sufficient for database identification, if the 

assays, the KIN fragment is typically labelled. After suitable soluble protein is known, or for cloning, if the soluble 

incubation, free KIN fragment is separated from that present protein is unknown. 

in bound form, and the amount of free or uncomplexed label XVI Use and Administration of Antibodies or Other Inhibi- 

is a measure of the ability of the particular agent to bind to tory Molecules 

KIN. 30 Antibodies, inhibitors, receptors or antagonists of KIN 
Another technique for drug screening provides high fragments (or other treatments to limit signal transduction, 
throughput screening for compounds having suitable bind- TST), can provide different effects when administered thera- 
ing affinity to the KIN polypeptides and is described in detail peutically. TSTs will be formulated in a nontoxic, inert, 
in European Patent Application 84/03564, published on Sep. pharmaceutical^ acceptable aqueous carrier medium pref- 
13, 1984, incorporated herein by reference. Briefly stated, 35 erably at a pH of about 5 to 8, more preferably 6 to 8, 
large numbers of different small peptide test compounds are although the pH may vary according to the characteristics of 
synthesized on a solid substrate, such as plastic pins or some the antibody, inhibitor, or antagonist being formulated and 
other surface. The peptide test compounds are reacted with the condition to be treated. Characteristics of TSTs include 
KIN fragment and washed. Bound KIN fragment is then solubility of the molecule, half-life and antigenicity/ 
detected by methods well known in the art. Purified KIN can immunogenicity; these and other characteristics may aid in 
also be coated directly onto plates for use in the aforemen- 40 defining an effective carrier. Native human proteins are 
tioned drug screening techniques. In addition, non- preferred as TSTs, but organic or synthetic molecules result- 
neutralizing antibodies can be used to capture the peptide ing from drug screens may be equally effective in particular 
and immobilize it on the solid support. situations. 

This invention also contemplates the use of competitive TSTs may be delivered by known routes of administration 
drug screening assays in which neutralizing antibodies 45 including but not limited to topical creams and gels; trans- 
capable of binding KIN specifically compete with a test mucosal spray and aerosol; transdermal patch and bandage; 
compound for binding to KIN fragments. In this manner, the injectable, intravenous and lavage formulations; and orally 
antibodies can be used to detect the presence of any peptide administered liquids and pills particularly formulated to 
which shares one or more antigenic determinants with KIN. resist stomach acid and enzymes. The particular 

XV Identification of Molecules Which Interact with KIN 50 formulation, exact dosage, and route of administration will 
The inventive purified KIN is a research tool for be determined by the attending physician and will vary 

identification, characterization and purification of according to each specific situation, 
interacting, signal transduction pathway proteins. Appropri- Such determinations are made by considering multiple 
ate labels are incorporated into KIN by various methods variables such as the condition to be treated, the TST to be 
known in the art and KIN is used to capture soluble or 55 administered, and the pharmacokinetic profile of the par- 
interact with membrane-bound molecules. A preferred ticular TST. Additional factors which may be taken into 
method involves labeling the primary amino groups in KIN account include disease state (e.g. severity) of the patient, 
with 125 I Bolton-Hunter reagent (Bolton, A. E. and Hunter, age, weight, gender, diet, time and frequency of 
W. M. (1973) Biochcm J 133:529). This reagent has been administration, drug combination, reaction sensitivities, and 
used to label various molecules without concomitant loss of tolerance/response to therapy. Long acting TST formula- 
biological activity (Hebert C. A. et al (1991) J Biol Chem 60 tions might be administered every 3 to 4 days, every week, 
266:18989-94; McColl S. et al (1993) J Immunol or once every two weeks depending on half-life and clear- 
150:4550-4555). Membrane-bound molecules are incubated ance rate of the particular TST. 

with the labelled KIN molecules, washed to removed Normal dosage amounts may vary from 0.1 to 100,000 

unbound molecules, and the KIN complex is quantified. micrograms, up to a total dose of about 1 g, depending upon 

Data obtained using different concentrations of KIN are used 65 the route of administration. Guidance as to particular dos- 

to calculate values for the number, affinity, and association ages and methods of delivery is provided in the literature, 

of KIN with the signal transduction complex. See U.S. Pat. No. 4,657,760; 5,206,344; or 5,225,212. Those 
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skilled in the art will employ different formulations for 
different TSTs. Administration to cells such as nerve cells 
necessitates delivery in a manner different from that to other 
cells such as vascular endothelial cells. 

It is contemplated that disorders or diseases which trigger 
defensive signal transduction may precipitate damage that is 
treatable with TSTs. These disorders or diseases may be 
specifically diagnosed by the tests discussed above, and such 
testing should be performed in cases where physiologic or 
pathologic problems are suspected to be associated with 
abnormal signal transduction. 

All publications and patents mentioned in the above 
specification are herein incorporated by reference. Various 
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modifications and variations of the described method and 
system of the invention will be apparent to those skilled in 
the art without departing from the scope and spirit of the 
invention. Although the invention has been described in 
connection with specific preferred embodiments, it should 
be understood that the invention as claimed should not be 
unduly limited to such specific embodiments. Indeed, vari- 
ous modifications of the above-described modes for carrying 
out the invention which are obvious to those skilled in the 
field of molecular biology or related fields are intended to be 
within the scope of the following claims. 



TABLE 1 



Clone 


Library 




297 


U937 


POOS 40 Mouse protooncogenc serAhr Vi* naw 


- 1622 


U937 


HUMCLK3B clk3 gene product 


10007 


THP-1 Phorbol LPS 


HSPLK1 protein kinase 


12702 


THP-1 Phorbol LPS 


RATSGPK ser/thr kinase 


23789 


Inflamed Adenoid 


CHKFRNK chicken tyr kinase 


35652 


HUVEC 


KEK5 Chicken Y kinase receptor 


35855 


HUVEC 


HUMANBTK37 tyr kinase 


40194 


T + B Lymphoblast 


KRB1 VARV Variola virus protein kinase 


42170 


T + B Lymphoblast 


HSU09564 serine kinase 


46081 


Corneal Stroma 


YSCKIN1 yeast protein kinase 


46651 


Corneal Stroma 


CDK4, P11802 


53840 


Fibroblast 


HSDAPK, Death-associated protein kinase 


54065 


Fibroblast 


SCPROKIN 1 yeast 35.6 kD 


56494 


Fibroblast 


KLMC RAT, myosin light chain kinase 


58029 


Skeletal Muscle 


ATHCTRIA 1 A. Thaliana Y kinase receptor 


64663 


Placenta 


KIN3 Yeast protein kinase P22209 


67967 


HUVEC Sheer Stress 


YAKJ Yeast protein kinase 


68963 


HUVEC Sheer Stress 


KATK Human Y kinase 


71904 


Placenta 


KIN3 P22209SwP 


75289 


THP-1 Phorbol 


H5U08023 Avian retrovirus rpl30 


81865 


Rheumatoid Synovium 


SNF1 Yeast C catabolite dc repressing 


82056 


HUVEC Sheer Stress 


P34314 C. elegans ser/thr kinase 


108485 


AML Blast 


KAPA Pig cAMP-dependent protein kinase 


114973 


Testis 


CC2B ARATH Mouse-ear cress ede 


118591 


Skeletal Muscle 


PB0192 mixed lineage kinase 1 


119819 


Skeletal Muscle 


H5U09564 ser kinase 


120376 


Skeletal Muscle 


U01064 Y kinase 


132750 


Bone Marrow 


MLK2 mixed lineage kinase 2 


140052 


T Lymphocyte 


G-protein coupled receptor kinase 


146392 


T Lymphocyte 


SCYAK1 Yeast Yakl kinase 


156108 


THP-1 Phorbol LPS 


U01064 Dictyosteltum Y kinase 


173627 


Bone Marrow 


MMU14166 Kiz 


181971 


Placenta 


HUMTKR Y kinase receptor 


182538 


Placenta 


HSNEK2R kinase 


184416* 


Cardiac Muscle 


KPKS Human proto-oncogene Ser/Thr kinase 


191283 


Rheumatoid Synovium 


RATSGPK Ser/Thr kinase 


192268 


Rheumatoid Synovium 


ATHAPK1A Ser/Thr kinase 


214915 


Stomach 


XLMPK2K Map kinase 


223163 


Pancreas 


TGF-p receptor ser/tbr kinase 


237002 


Small Intestine 


P16227 Mouse Y kinase blk 


239990 


Hippocampus 


■ SHC Human transforming protein 


240142 


Hippocampus 


HSNEK2R 


275781 


Testes. 


BOVCKIA casein kinase 


285465 


Eosinophils 


DDIMLCK myosin light chain kinase 



SEQUENCE LISTING 



( 1 ) GENERAL INFORMATION: 

(ill ) NUMBER OF SEQUENCES: 45 

( 2 ) INFORMATION FOR SEQ ID NO:l: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 526 base pairs 
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-continued 



( B ) TYPE: nucleic acid 

( C ) STRANDEDNESS: single 

( D ) TOPOLOGY: lincai 



( i i ) MOLECULE TYPE: cDNA 



( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: U937 
( B ) CLONE: 297 



( x i ) SEQUENCE DESCRIPTION: SEQ CD NO:l: 

ACAAGGGTTG TAATTAAAGG CGATTTTGAA ACAATTAAAA TCTGTGATGT AGGAGTCTCT 

CTACCACTGG ATG A AAATAT GACTGTGACT GACCCTGAGG CTTGTTACAT TGG C A C AG AG 

CCATGGAAAC CCAAAGAAGC TGTGGAGGAG AATGGTGTTA TTACTGCAAG GCAGACATAT 

TTGCCTTTGG CTTACTTTGT GGGAAATGAT GACTTTATCG ATTCCACACA TTAATCTTTC 

AAATGATGAT G ATG ATGA AG TAAAAACTTT TTGATGAAAA GTAATTTT G A TGTTGAAGCA 

TT AC TATGCA AGCCCTTTOG ACCTAAGGCC ACCCTATTTT AATATTGGAG GACCTTGGTG 

AATCATACCC AGGAAGGTAA TTTGACCTCT TCTCTGATCA CCCTTATTGA AGCCCCCAAG 

CACCCTTCTT GTGACAATTT TAGGTTGGAC CACTTGCTTT GGGCCAACTT AACTAAAGTT 

GTTCGAAAAA CTTTTTTCCA AAAATTTCCA TAGGCCTCCC AAGTTT 

( 2 ) INFORMATION FOR SEQ ID NO:2: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 378 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: U937 
( B ) CLONE: 1622 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:2: 

AGAACACCAC ATCCGAGTGG CTGACTTTGG CAGTGCCACA TTTGACCATG AGCACCACAC 

CACCATTGTG GCCACCCGTC ACTATCGCCG CCTGAGGTGA TCCTTGAGCT GGGCTGGGCA 

CAGCCTGGTG ACGTCTGGGC ATTGGCTGCA TTCTCTTTGA GTACTACCGG GGCTTCACAC 

TCTTCCAGAC CCACGAAAAC CGAGAGCACC TGGTGATGAT GGAGAAGATC CTAGGGCCCA 

TCCCATCACA CATGATCCAC CGT ACCAGGA ACCAGAATAT TTCTACAAAG GGGGCCT AGT 

TTGGGATGGA CAGCTCTTAC GGCCGGTATG TAAGGGACTC AAACCTTTAA GGTTCATGTT 

CAAGCTTCCT GGGAAGTG 

( 2 ) INFORMATION FOR SEQ ID NO:3: 

( i ) SEQUENCE CHARACTERISTICS: 



( I i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: THP-1 Phorbol LPS 
( B ) CLONE: 10007 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:3: 



6 0 

1 2 0 

1 8 0 

2 4 0 

3 0 0 

3 6 0 

4 2 0 

4 8 0 

5 2 6 



6 0 

1 2 0 

1 8 0 

2 4 0 

3 0 0 
3 6 0 
3 7 8 



( A ) LENGTH: 326 base pin 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: tingle 
( D ) TOPOLOGY: linear 



GGGCTGGCAG CCCGGTTGGA GCCTCCGGAC CAGAGGAAGA AGACCATCTT GGCACCCCCA 
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-continued 
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ACT ATGTGGC 


TCCAGAAGTG 


CTGCTGAGAC 


AGGGCCACGG 


CCCTGAGGCG 


GATGTATGGT 


1 2 0 


CACTGGGCTG 


TGTC ATGT AC 


ACGCTGCTCT 


GCGGGACCCT 


CCCTTTGAGA 


CGGCTGACCT 


1 8 0 


GAAGGAGACG 


TACCGCTGC A 


T CAAGAAGGT 


T CACTAC A AC 


GGTGC CTGC C 


AG CT CTT AAT 


2 4 0 


OCCTGC.CCGA 


GTCCTTGGCC 


GCAATCCTTC 


GGGCCTTAAC 


CCG AGAACCG 


GCCCTCTATT 


3 0 0 


GACAGATCCT 


TGCGGCAATT 


AACTTT 








3 2 6 



( 2 ) INFORMATION FOR SEQ ID NO:4: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 257 base pairs 
( B ) TYPE; nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: THP-1 Phorbot LPS 
( B ) CLONE: 12702 



( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:4: 



CCGCAAGACA 


CCTCCTGGAG 


GGCCTCCTGA 


GAAGGAC AGG 


CAAAGGGCTG 


GGCCAAGGAT 


6 0 


GACTTCATGG 


AGATTAAGAG 


TCATGTTTCT 


TCTCCTT AAT 


TAACTGGGAT 


GATCTCATT A 


1 2 0 


ATAAGAAGAT 


TACTCCCCCT 


TTTACCCAAA 


TGTGAGTGGG 


CCCAACGCCT 


ACGGACTTTG 


1 8 0 


CCCCGAGTTT 


ACGAAG AGCC 


TTCCCCAATC 


C ATTGGA AGT 


CCC CTGA A AG 


GTCCTATACA 


2 4 0 


AGTCAGTTAA 


GGAAGTT 










2 5 7 



( 2 ) INFORMATION FOR SEQ ID NO:5: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 252 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Inflamed Adenoid 
( B ) CLONE: 23789 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

GTG AAGAATG TGGGGCTGAC C CT C GG A AGT CATCGGGAGC GTGGATGATC TCCTGCCTTC 

CTTGCCGTCA TCTCACGGAC AGAGATCGAG GGCACCCAGA AACTGCTCAA CAAAGACCTG 

GCAGAGCTCA TCAACAAGAT GCGCTGGCGC AAGAACGCGT GACCTCCCTG TAG G AGT A AG 

A GG C A G AT C T GACGGTTCAC AACCCTGG CT GTGACGCAAG AACCTCTTAC GTGTGCCAGG 
CCCAAAGTTC TO 



( 2 ) INFORMATION FOR SEQ ID NO:6: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 255 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Huvcc 
( B ) CLONE: 35652 



( x i ) SEQUENCE DESCRIPTION: SEQ tt> NO:6: 
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-continued 




32 
*j+* 




C A AAATCGTG 


GCCCGG AG AA 


TGGCGGGGC C 


TC AACCCTCT 


CCTGGACCAG 


CGGCAGCTCA 


6 0 


CTACTCAGCT 


TTTGGCCTGT 


GGGCGAGTGG 


CTTCGGGCCA 


TCAAAATGGG 


AAGATACGAA 


1 2 0 


GAAAGTTTCG 


C A GCCGCTGG 


CTTTGGCTCC 


TTCAGCTGGT. 


CAGCCAGATC 


TCTGCTGAGG 


1 8 0 


ACCTGCTCCG 


AATCGAGTCA 


CT CTGGCGGG 


ACACCAGAAG 


AAAAT TTGGC 


CAGTTCCAGC 


2 4 0 


AC ATGAGTCC 


CAGGT 










2 5 5 



( 2 ) INFORMATION FOR SEQ ID NO:7: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 238 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Kuvcc 
( B ) CLONE: 35855 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:7: 

GA AT AC C CCA TATA C AT AGT G ACT GAT AT A TAAGCAATGG CTGCTTGCTG AATACCTGAG 

GAGTC ACGGA AAAGGCTTAA CCTTCCCAGT CTTAGAAATG TGCTACGATG TCTGTAAGGC 

ATGGCCTTCT TGGAGAGTCA CCAATTCATA CACCGGGCTT GGCTGCTCGT AACTGCTTCG 

TGGACAGAGA TCTCTGTGTG AAAGTT CTCC ATTTGGATGA CAAGGTATGT TCTTGATG 



( 2 ) INFORMATION FOR SEQ ID NO:8: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 261 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: T+B Lymphoblast 
( B ) CLONE: 40194 



( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:8: 



AAAC AACTTG 


ATTATTTAGG 


AATTCCTCTG 


TTTTATGGAT 


CTGGTCTGAC 


TGAATTCAAG 


6 0 


GG A AG A AGT T 


ACAGATTTAT 


GGTAATGGAA 


AGACTAGGAA 


T AGATTTACA 


GAAGATCTCA 


1 2 0 


GGCCAGAATG 


GTACCTTTAA 


AAAGTCAACT 


GTCCTGCAAT 


TAGGATCCGA 


ATGTTGGATG 


1 80 


TACTGGAATA 


TATACATGAA 


AATGAATATG 


TTCATGGTGA 


T AT AAAAGCA 


GCAAATCTAC 


2 4 0 


TTTTGGGTTA 


CAA AAATCCT 


T 








2 6 1 



( 2 ) INFORMATION FOR SEQ ID NO:9: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 242 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: T+B Lymphoblast 
( B ) CLONE: 42170 



( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:9: 
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TAAGAAACCT 


GAAGATCGAG 


CCACTGCTGA 


AGAATGTCT A 


AAGCACCCCT 


GGTTGACACA 


6 0 


GAGC AGT ATT 


CAAGAGCCTT 


CTTTCAGGAT 


GGAAAAGGC A 


CT AGAAGAAG 


CAAATGCCCT 


1 2 0 


CCAAGAAGGT 


CATTCTGTGC 


CTGAAATTAA 


TTCGGATACC 


G A C A A A T C AG 


AAACCGAGGA 


1 8 0 


ATCCATTGTA 


A C C G A AG AGT 


TAATf GTAGT 


TACTTCATAT 


ACT C T AGGGC 


AATGCAGACA 


2 4 0 


GT 












2 4 2 



( 2 ) INFORMATION FOR SEQ ID NOU0: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 222 base pairs 
( B ) TYPE: nucleic ecid 
( C ) STRAND EDNESS : single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Corneal Stroma 
( B ) CLONE: 46081 



( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:10: 



GCAAAGGACA 


GTCCGCCGAG 


GTGCTCGGTG 


GAGTCATGGC 


ATTCCCTTTT 


GGAAGACTGG 


6 0 


CCTTGGTG C A 


AACCCTGGAG 


AAGGTGCCTA 


TGGAGAAGTT 


CAACTTGCTG 


TAAATAGAGT 


1 2 0 


AACTAAGAAG 


CAGT CGCAGT 


GAAGATTT AG 


AT ATAAGCGT 


GCCGTAGACT 


GTCCCGAAAA 


1 8 0 


T ATTAAGTAG 


ATCTGTATCA 


AT A A A ATG CT 


AATC ATGAAA 


TT 




2 2 2 



( 2 ) INFORMATION FOR SEQ ID NO:ll: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 225 base pairs 
( B ) TYPE: nodeic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i j ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Corneal Stroma 
( B ) CLONE: 46651 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 

ATGCTCCGCC AGTGAGAAGG OCGGCTGCCT GAGCGCCTCA CCAGTCCTCA T C A C C C AG AT 60 

CCTGTGGCTT TGAG AC AC CT T C A CTT A AG A AC ATTTGCCA CTTGACTTAA ACCAGAAACG 120 

TGTTTTGTGG CATCAGCAGA CCCTTTCTCA GGTAAGTTGT GCTTTGCTTT TAGCATACGT 180 

GAGAAGTTGT TCCGCTCCAT TTTGTGGGAC GTCTTTCTTT CCTTG 225 



( 2 ) INFORMATION FOR SEQ ID NO:12: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 256 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I I ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Fibroblast 
( B ) CLONE: 53840 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:12: 

CAGCGCCTTA CATCTCGCAG CCAAGAACAG CCACCATGAA TGCATCAGGA AGCTGCTTCA 60 

TCTAAATGCC CAGCCGAAAG TTTTGACAGC T C TGGGAA A A CAGCTTTACA TTATGCAGCG 120 



f 



Mr 
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GCTCAGGGCT GCCTT CAAGC TGTGCAGATT CTTGCGAACA CAAGAGCCCC ATAAACCTCA 180 
AAGATTTGGA TGGGAATATA CCGCTGCTGC TTGCTGTACA AAATGGTCAC AGTGAGATCT 240 
GTCACTTTTC CTGGTC 256 

( 2 ) INFORMATION FOR SEQ ID NO: 13: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 240 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Fibroblast 
( B ) CLONE: 54055 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:13: 

GTTGACATCT GGTCCCTGGG CAT ATGGCC A TCGAAATGAT TGAAGGGGAO CCTCATACCT 60 

CAATGAAAAC CCTTG AG AGC CTTGTACCTC ATTGCCACCA ATGGGACCCC AGAACTTCAG 120 

AACCCAGAGA AGCTGTC AGC TATCTTCCGG GACTTTCTGA ACCGCTGTCT C G A GAT GGAT 180 

GTGGAGAAGA GAGGTTCAGC TAAAGACCTG CTACAGCATC AATTCCTGAA GATTGCCAAT 240 

( 2 ) INFORMATION FOR SEQ ID NO: 14: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 195 base pirs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Fibroblast 
( B ) CLONE: 56494 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:14: 

AACAGTGAAG AGCTCCG AGA AATTATGGCT ACCCTGATAT GTGGCTCCTG AAATTTAGTT 60 

ATGATCCTAT AAGC A TGGC A ACAG AT ATTG GAGCATTGGA GTGTTAACAT ATGTCATGCT 120 

TACAGGAATA TCACCTTTTT AGGCAATGAT AAAC AAGA A A CATTCTTAAA CATCTCACAG 180 

ATGATTTTAA GTTAT 195 

( 2 ) INFORMATION FOR SEQ ID NO:15: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 207 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( I i ) MOLECULE TYPE: cDNA 

( v I i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Skeletal Muscle 
( B ) CLONE: 58029 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:l5: 

GGAGTGTTTA TCGAGCCAAA TGGATATCAC AGG AC A AG G A GGTGGCTGTA AAG AAGCTCC 60 

TCAAAATAGA GAAAGAGGCA G A A AT A CT C A GTCTCCTCAG TCACAGAAAC ATCATCCAGT 120 

TTTATGGAGT AATTT TG A AC CTCCCAACTA TGGCATTGTC ACAGAATATG CTTCTTGGGT 180 



§ 
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CACTCTATGA TTACATTAAC AGTACAA 207 

( 2 ) INFORMATION FOR SEQ ID NO:lti: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 184 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDED NESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v 1 i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Placenta 
( B ) CLONE: 64663 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:16: 

CGGGGTGGTA AAACTTGGAG ATCTTGGGAT TGGCGGTTTT AGCTCAAAAA CCACAGCTGC 60 

ACATTCTTTA GTTGGTACGC CTATTCATGT TCCAGAGGAT ACAGAAATGG ATACAACTTC 120 

AAATCTCATC TGGTCTCTTG GCTGTCT ACT ATATGGATGG CTGCAT TACA AAGTCCTTTC 180 

TATG 18* 

( 2 ) INFORMATION FOR SEQ ID NO:17: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 206 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i 1 ) IMMEDIATE SOURCE: 

( A ) LIBRARY: HUVEC Sheer Stress 
( B ) CLONE- 67967 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:l7: 

TGAATTGCTG AGCATAGACC TTTATGAGCT GATTAAAAAA AATAAGTTTC AGGTTTTAGC 60 

GTCCAGTTGG TACGCAAGTT TGCCCAGTCC ATCTTGCAAT CTTTGGTGCC CTCCACAAAA 120 

TAAGATTATT CACTGCGAT C TGAGCCAGAA AACATTCTCC TGAAACACCA CGGGCGCAGT 180 

TCAACCAAGG TCATTGACTT TGGGTT 206 

( 2 ) INFORMATION FOR SEQ ID NO: 18: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 268 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

. ( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: HUVEC Sheer Stress 
( B ) CLONE: 68963 

( x i ) SEQUENCE DESCRIPTION: SEQ XD NO:18: 

GGGAAGTGGC CAGTTTGGAG TGGTCAGCTG GGCAAGTGGA AGGGGCAGTA TGATGTTGCT 60 

GTTAAGATGA TCAAGGAGGG CTCCATCTCA GAAGATGAAT T TTTCAGG AG GCCCAGACTA 120 

TATGAAACTC AGCCATCCCA AGCTGGTTAA ATTCT ATGGA GTGTGTTAAA CGATTACCCC 180 

ATATACATGT GACTAATATA TAGCAATGCT TGCTTTTCTG AATTACCTGG GGAGTCACGG 240 

AAAAAGGACT TTTAACCCTT CCCGCTTG 268 
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( 2 ) INFORMATION FOR SEQ ID NO:19: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 224 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v 1 i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Placenta 
( B ) CLONE: 71904 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:l9: 

CCTGGGGTGG TAAAACTTGG AGACTTGGCT TGGCCGGTTT TCCACCTCAA AAACCACAGC 60 

TGCACATCCT TTAG TTGGT A CGCCTTATTA CATGTTCCAG AGAGATACAT G A A A AT G G AT 120 

AC A AC T C A A A CTGACATCTG GCCTTTGGCT GTT ACT ATAT GAATGGCTGC TTACAAAGCC 180 

TTCCTATGGT GACAAAATGA TTTT ACT CAT TGTGTAAGAG ATAG 224 

( 2 ) INFORMATION FOR SEQ ID NO:20: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 195 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i j ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: THP-1 Pborbol 
( B ) CLONE: 75289 

( x 1 ) SEQUENCE DESCRIPTION: SEQ ID NO:20: 

GCGGGGAATG ACTCCCTATC CTGGGGTCCA GAACCATGAG ATGTATGATA TCTTCTCCAT 60 

GGCCACAGGT TGAAGCAGCC CGAAGACTGC CTGGTGAACT GTATGAAATA ATGTACTCTT 120 

GCTGCAGAAC CGATCCCTT A GACCGCCCCA CCTTTTCATA TTGAGGCTGC AGCTAGAAAA 180 

ACTCTTAGAA AGTTT 195 

( 2 ) INFORMATION FOR SEQ ID NO:2l: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 219 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: , 

( A ) LIBRARY: Rheumatoid Synovium 
( B ) CLONE: 81865 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:21: 

CACACGAGAA GC AGAAAC AC GACGGGCGGG TAAGATCGGC CACTACATTC TGGTGACACG 60 

CTGGGGGTCG GCACCTT CGG CAAAGTGAAG GT TGGCAAAC ATGATTGACT GGCATAAAGT 120 

AGCTGTAAGA TACTCATCGA CAGAAGATTC GGAGCCTTGA TGTGGTAGGA AAAATCCCAG 180 

G A A AT T CAG A ACCTCAAGCT TTTCAGGCAT CCTC AT ATA 219 

( 2 ) INFORMATION FOR SEQ ID NO:22: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 181 base pain 
( B ) TYPE: nucleic acid 
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( C ) STRANDED NESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE; 

( A ) LIBRARY: HUVEC Sheer Stress 
( B ) CLONE: 82056 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

CCACCAAAGA TCTCAAATAA AGTT GATGTG TGGTCGGTGG GTGTATCTCT ATCAGTGTCT 60 

TTATGGAAGG AAGCCTTTTG GCCAT AACCA GTCTCAGCAA GACATCCTAC AAGAGAATAC 120 

GATTTTAAAG CTACTGAAGT GCAGTT CCCG CCAAAGCCAG TAGTAACACC TGAAGCAAAG 180 

G 181 

( 2 ) INFORMATION FOR SEQ ID NO:23: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 218 base pirs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: AML Blast 
( B ) CLONE: 108485 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:23: 

TATGGTTATA TGGAAGAGAA TGTGACTGGT GGTCGGTTGG GGTATTTTTA TACCAAATGC 60 

TTGTAGGTGA TACACCTTTT TATGCAGATT CTTTGGTTGG AACTTACAGT AAAATTATGA 120 

ACCATAAAAA TTCACTTACC TTTCCTGATG ATAATGACAT ATCAAAAGAA GCAAAAAACC 180 

TTATTTGTGC CTTCCTTACT G A C AGGGAAG TGAGGTTA 218 

( 2 ) INFORMATION FOR SEQ ID NO:24: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 264 base pahs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Testis 
( B ) CLONE: 114973 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:24: 

GACGGTGGCC ATTTGACATG TGGAGCCTGC GTGCATCACG GTGGAGTTGT A C A CGGGCT A 60 

CCCCCTGTTC CCCGGGAGAA TGAGGTGGAG CAGCTGGCCT GCATCATGGA GGTGCTGGGT 120 

CTGCCGCCAG CCGGCTTCAT TCAGACAGCC TCCAGGAGAC AGACATTCTT T G ATT C C A AA 180 

GGTTTTCCTA AAAATATAAC CACAACCAGG GGAAAAAAAG ATTCCAGATT CCAAGGGCCC 240 

TCACGGATTG GTGCTGAAAA AACT 264 

( 2 ) INFORMATION FOR SEQ ID NO:25: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 236 base pin 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 
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( I 1 ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Skeltal Muscle 
( B ) CLONE: 118591 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:25: 

GACTOAGGAC ACTGAAACAT CATCCAGTTT TATGGAGTAA TTCTTGAACC TCCCAACTAT 60 

GGC ATTGTCA CAGAATATGC TTCTCTGGGA T C AC TCTATG A T T A CAT T A A CAGTAACAGA 120 

AGTGAGGAGA TGGATATGGT CACATTATGA CCTGGGCCAC TGATGTAGCC AAAGGAATGC 180 

ATTATTTACA TATGGGGCTC CTGTCAAGGT GATTCACAGA GACCTCAAGT CAAGGA 236 

( 2 ) INFORMATION FOR SEQ ID NO:26: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 200 base pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Skeltal Muscle 
( B ) CLONE: 119819 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:26: 

CCTCCATGGC CTTCGAGCTG GCCACTGGTG ACTACCTGTT CGAGCCGCAT TCTGGAGAAG 60 

ACTACAGTCG TGATGAGGGT AAGGGGTGAG GGCTCTGGGC TCAGCCTCCC GGCCTCCCGG 120 

CC.TGCCTGCC CCCAACCTCC TCTTTTGCCC ACAGACCACA TCGCTCACAT AGTGGAGCTT 180 

CTGGGGGACA TCCCCCCAGC 200 

( 2 ) INFORMATION FOR SEQ ID NO:27: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 217 base pin 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v M ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Skeletal Muscle 
( B ) CLONE: 120376 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:27: 

GATTACAAGT AGCTTGGTTG TAGTGGAAAA AAACGAGAGA TTAACCATTC CAAGCAGTTG 60 

CCCCAGAAGT TTTGCTGAAC T T T A CAT C AG TTTGGGAAGC TGATGCCAAG AAACGGCCAT 120 

CATTCAAGCA AATCATTTCA AT CCTGGGTC CATGTCAAAT GACACGAGCC TTCCTGCAAG 180 

TGTAACTCAT TCCTACACAA CAAGGCGGAG TGGAGGT 217 

( 2 ) INFORMATION FOR SEQ ID NO:28: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 156 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Bone Marrow 
( B ) CLONE: 132750 
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( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:28: 

GT AG AT T TG A CTCTGTTGTT TTCTCTCGTA GTTCCCAAAC TCATGGAAGT CTGTTTTTAT 60 

CAATATGATG TAAAGTCTGA A AT ATA C AG C TTTGGAATCG TCCTCTGGGA A AT CGC C ACT 120 

GGAGATATCC CGTTTCAAGG CTGTAATTCT OAGAAG 156 

( 2 ) INFORMATION FOR SEQ Q> NO:29: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 224 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: T Lymphocyte 
( B ) CLONE: 140052 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:29: 

TGTAAATAAG GCCCTTCTCC A CTTGACTT C AGGCAGCAGA TTGTCTAGAA GCCTAAGGAC 60 

AGCAATTTCT CTGACAAGAC AAAGTAGATA TTTTATACCA GGGGTTGGCA AACTACTGCC 120 

CACGGGCCGA ATTTGGCC C A GTCTGTTTTT GT AT GGT G C A AACTAAAAAT GATTTTTACA 180 

TTTTTAAAGA GTTATAAAAG AAAAAAATAT GTGGTCTGTG A A A T 224 

( 2 ) INFORMATION FOR SEQ ID NO:30: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 198 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: T Lymphocyte 
( B ) CLONE: 146392 

( x i ) SEQUENCE DESCRIPTION: SEQ ED NO:30: 

TTTTCTTTGT GTTTTTTTTT GTTCCAGTTT ATTTTAAATG CATATTTTAG TTGATT GCTT 60 

TTTTAAAAAG CCCCCTCTGG CCTCCTGATT CCAGCTAGTG TCAGCAGTGG GAT AC CTGCG 120 

CTTGAAGGAC ATCATCCACC GTGACATCAA GGATGAGAAC ATCGTGATCG CCGAGGACTT 180 

CACAATCAAG CTGATAGT 198 

( 2 ) INFORMATION FOR SEQ ED NO:31: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 210 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
{ D ) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: cDNA 

(vl i) IMMEDIATE SOURCE: 

( A ) LIBRARY: THP-1 Phorbol LPS 
( B ) CLONE: 156108 

( x i ) SEQUENCE DESCRIPTION: SEQ EO NO:31: 

TGAAAACTAT GAACCTGGAC AAAAATCAAG GGCCAGTATC AAGCACGATA TATATAGCTA 60 

TGCAGTTATC ACATGGGAAG TGTTATCCAG AAAACAGCCT TTTGA AGATG TCACCAATCC 120 
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TTTGCAGATA ATGTATAGTG TGTCACAAGG ACATCGACCT GTTATTAATG AAGAAAGTTT 180 
GCCATATGAT ATACCTCACC GAGCACGTAT 210 

( 2 ) INFORMATION FOR SEQ ID NO:32: 

( j ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 202 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Bone Marrow 
( B ) CLONE: 173627 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NO:32: 

AGAAGATCGG GGCCGG CTTC TTCTCTGAGG TCT AC AAGGT TCGGCACCGA CAGTCAGGGC 60 

AAGTATGGTG CTG A AG ATGA ACAAGCTCCC CAGTAACCGG GOCAACACAC TACGGGAAGT 120 

GCAGCTGATG AACCGGCTCA GGCACCCCAA CATCCTAAGG TTCATGGGAG TCTGTGTGCA 180 

CCAGGGACAG CTGCACGCTC TT 202 

( 2 ) INFORMATION FOR SEQ ID NO:33: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 222 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

C v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Placenta 
( B ) CLONE: 181971 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:33: 

CGTTTTTGGA GGGTTCACAC CTGTCCCTTT CAAATGCTGG CGCTTTCACA CACTCCTTCT 60 

CTCCTGCCAG CACCTTCTGG TCTCAGGAGC ATTGCAGGAT GTTGTGTGAG TAAGTATGGG 120 

AGACACTTTA GTATGGCTTT TTTCAGCTTA GCCTCCTGTT ATCAGAGAGC AGTCTCTTTC 180 

AGTGTCAAGG TTTGAGTACT AGATGGTGGA GAAAGCCTGT TT 222 

( 2 ) INFORMATION FOR SEQ D> NO:34: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 192 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Placenta 
( B ) CLONE: 182538 

( i I ) SEQUENCE DESCRIPTION: SEQ ID NO:34: 

CTTGGGGTGG TAAAACTTGG AGATCTTGGG CTTGGCCGGT TTTTCAGCTC AAAAACCACA 60 

GCTGCACATT CTTTAGTTGG TACGCCTTAT TACATGTCTC CAGAGAGAAT ACATGAAAAT 120 

GG AT AC A AC T TCAAATCTGA CATCTGGTCT CTTGGCTGTC TACTATATGA GAT GG CTG C A 180 

TTACAAAGTC CT 192 
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( 2 ) INFORMATION FOR SEQ ID NO:35: 

( i ) SEQUENCE CHARACTERISTICS: 
(. A ) LENGTH: 152 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRAND EX) NESS : single 
( D ) TOPOLOGY: linear 

( i 1 ) MOLECULE TYPE: cDNA 

( v I 1 ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Cardiac Muscle 
.(B) CLONE: 184416 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:35: 

CTATGGAAGG CCGCTGGCAG GGCAATGACA TTGTCGTGAA GGTGCTGAAG GTTCGACACT 60 

GG AGT AC A AG GAAGAGCAGG G ACT T CA AT G AAGAGTGTCC CCGGCTCAGG ATTTTTCCCA 120 

TCCAAATGTG CTCCCAGTGC TAGGTGCCTG CC 152 

( 2 ) INFORMATION FOR SEQ ID NO:36: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 152 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Rheumatoid Synovium 
( B ) CLONE: 191283 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:36: 

CAACTACAGT GAACCTAAAA TGCCTCTAAT ACCTTTGCAA TTATCTTTAA GAGG AT AT CT 60 

TATGAGTGAA ATTAACT TGT GCAACTACTT TCC TATTCAC TTTTTTACAG AGACTTAAAA 120 

CCAGAGAATA TTT C TAG AT T CACAGGGACA CT 152 

( 2 ) INFORMATION FOR SEQ ID NO:37: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 199 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i 1 ) MOLECULE TYPE: cDNA 

( v i I ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Rheumatoid Synovium 
( B ) CLONE- 192268 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:37: 

AGTGGACTOC AGTAAGCAGA GCTTCCTGAC CGAGGTGGAG CAGCTGTCCA GGTTTCGTCA 60 

CCCAAACATT GTGGACTTTC TGGCT ACTGT GCTC AGAACG GCTTCTACTG CCTGGTGTAC 120 

GGCTTCCTGC CCAACGGCTC CCTGGAGGAC CGTTCCAC TG CCAGACCCAG GCCTGCCCAC 180 

CTCTCTCCTG GCCTCAGCG 199 



( 2 ) INFORMATION FOR SEQ CD NO:38: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 189 base pairs 
( B ) TYPE nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 



( i i ) MOLECULE TYPE: cDNA 
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( v i i ) IMMEDIATE SOURCE; 

( A ) LIBRARY: Stomach 
( B ) CLONE: 2149L5 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:38: 

AGAAGATCCA GTACCTGGTG T AT CAATGCT CAAAGGC C T T AAGTACATCC ACTCTCTGGG 60 

GTCGTGCACA GGGACCTGAA GCCAGGCAAC CTGGCTGTGA ATAGGACTGT AACTGAAGAT 120 

TCTGGATTTT GGGCTGGCGC GACATGCAGA CGCCGAGATG ACTGGCTACG TGGTGACCCG 180 

CTGGTACCT 189 

( 2 ) INFORMATION FOR SEQ ID NO:39: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 167 base pairs 
( B ) TYPE: cocleic acid 
( C ) STRAND ED NESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i 1 ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Pancreas 
( B ) CLONE: 223163 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO J9: 

CTTGCTCTTC TGACAGGATG AGAGTTATTA TAAGCAAATC CTACCTAGAG GCTTTTAACT 60 

CTAATGGGAA TAACTTGCAA CTAAAAGACC CAACTTGCAG ACCAAAATTA TCAAATGTTG 120 

TGGATTTTCT GTCCCTCTTA ATGGATGTGG TACAATCAGA AAGGTAG 16 7 

( 2 ) INFORMATION FOR SEQ JD NO:40: 

/ 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 197 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRAND EDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Small Intestine 
( B ) CLONE: 237002 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:«k 

CCCAAACCTG CCCAGCCAGC CCTGAAAATG CAAGTTTTGT ACGATTTTGA AGCTAGGAAC 60 

CCACGGGAAC TGACTGTGGT CCAGGGAGAG AAGCTGGAGG TTTGGACCAC AGCAAG CGGT 120 

GGTGGCTGGT GAAGAATAGG CGGGACGGAG CGGCTACATT CCAAGCAACA TCTGGGCCCC 180 

TACAGCCGGG GACCCCG 197 

( 2 ) INFORMATION FOR SEQ ID NO:41: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 207 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDED NESS: single 
{ D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Hippocampus 
( B ) CLONE: 239990 

(il) SEQUENCE DESCRIPTION: SEQ ID NO:41: 
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CCAAGATGCT GGAGGAACTC AAGCCGAGAC TTGTACCAAG GAGAGATGAG CAGGAAGGAG 60 

GCAGAGGGCT CTGAGAAAGA CGGGACTTCC TGGTCAGGAA GAGCACCACC AACCCGGGCT 120 

CCTTTTCCTC ACGGGCATGC ACAATGGCCA GGC AAGCACC TGCTGCTCTT GGACCCAGAA 180 

GGCACGTCCG GACAAAGGCA . GAGTCTT 207 

( 2 ) INFORMATION FOR SEQ ID NO:42: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 195 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v 1 i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Hippocampus 
( B ) CLONE: 240142 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:42: 

GTCACCGGAG AGGAT CCATG AGAACGGCTA CAACTTCAAG TCCGACATCT GGTCCTTGGG 60 

CTGTCTGCTG TACGAGATGG CAGCCCTCCA GAGCCCCTTC TATGGAGATA AGATGAATCT 120 

TTCTCCCTGT GCCAGAAGAT CGAGCAGTGT GACTACCCCC CACTCCCCGG GGAGCACTAC 180 

TCCGAGAAGT TACGT 195 

( 2 ) INFORMATION FOR SEQ ID NO:43: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 213 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Testes 
( B ) CLONE: 275781 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

CTCGTCTATT CGGCACGAGT TTCATTGTCG AAGGAAATAT AAACTGTCTG GA AG AT CT GG 60 

TGTAGCTCCT TCGAGACATC TTTGGCGATC AGCATCACCA ACGGTAAGAA GTGTAGTAAG - 120 

CCAGATCTCA GGGCCAGGCA TCCCCAGTTG CTGTACAAGA GCAGGCTTTC AAGATGCTTC 180 

AAGGTCCCTG TCCATCAATA T GCTACACAT TTG 213 

( 2 ) INFORMATION FOR SEQ ID NO:44: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 425 base pairs 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( v I I ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Eosinophils 
( B ) CLONE: 285465 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:44: 

AAATACTTGA AGGAGTTTAT T AT CT A CAT C AGAATAACAT TGTACACCTT GATTTAAAGC 60 

CACAGAATAT ATTACTGAGC AGCATATACC CTCTCGGGGA CATTAAAATA GTAGATTTTG 120 



GAATGTCTCG AAAAATAGGG CAT G CGTGT G AACTTCGGGA AATCATGGGA ACACCAGAAT 180 
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ATTTAGCTCC 


AGAAAT CCTG 


AACT ATGATC 


CCATT ACCAC 


AGC AAC AGAT 


AT GTGGA AT A 


2 4 0 


TTGGT AT AAT 


AGCATATATG 


TTGTTAACTC 


ACAC ATCACC 


ATTTGTGGGA 


GAAGATAATC 


3 0 0 


AAGA A AC AT A 


CCTCAATATC 


TCT CAAOTTA 


ATOTAGATTA 


TTCGGAAGGA 


ACTTTTTCAT 


3 6 0 


CAGTTTCACA 


GCTGC C AC AG 


ACTTT ATTCA 


GAGCTTTTAG 


TAAAATCAGA 


GGAAAGGCCC 


4 2 0 


ACAGC 
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( 2 ) INFORMATION FOR SEQ ID NO:45: 

( i ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 1851 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS : single 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE cDNA 

( v i i ) IMMEDIATE SOURCE: 

( A ) LIBRARY: Stomach 
( B ) CLONE: 214915E 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:45: 

GCCCGTTGGG CCGCGAACGC AGCCGCCACG CCGGGGCCGC CGAGATCGGG TGCCCGGGAT 60 

GAGCCTCATC CGGAAAAAGG GCTTCTACAA GCAGGACGTC AACAAGACCG CCTGGGAGCT 120 

GCCCAAGACC TACGTGTCCC CGACGCACGT CGGCAGCGGG GCCTATGGCT CCGTGTGCTC 180 

GGCCATCGAC AAGCGGTCAG GGGAGAAGGT GGCCATCAAG AAGCTGAGCC GACCCTTTCA 240 

GTCCGAGATC TTCGCCAAGC GCGCCTACCG GGAGCTGCTG TTGCTGAAGC ACATGCAGCA 300 

TGAGAACGTC ATTGGGC T C C TGGATGT CTT CACCCCAGCC TCCTCCCTGG AACTTCTATG 360 

ACTTCTACCT GGTGATGCCC TTCATGCAGA CGGATCTGCA GAAGATCATG GGGATGGAGT 420 

TCAGTG AGG A GAAGATCCAG TACCTGGTGT AT C AG ATGCT CAAAGGCCTT AAGTACATCC 480 

ACTCTGCTGG GGTCGTGCAC AGGGACCTGA AGCCAGGCAA CCTGGCTGTG AATGAGGACT 540 

GTGAACTGAA GATTCTGGAT TTGGGGCTGG CGCGACATGC AGACGCCGAG ATGACTGGCT 600 

ACGTGGTGAC CCGCTGGTAC CGAGCCCCCG AGGTGATC CT CAGCTGGATG CACTACAACC 660 

AGACAGTGGA CATCTGGTCT GTGGGCTGTA TCATGGCAGA GATGCTGACA GGGAAAACTC 720 

TGTTC A AGGG GAAAGATTAC CTGGACCAGC TGACCCAGAT CCTGAAAGTG ACCGGGGTGC 780 

CTGGC ACGGA GTTTGTGCAG AAGCTGAACG ACAAAGCGGC CAAATCCTAC ATCCAGTCCC 840 

TGCCACAGAC CCCCAGGAAG GATTT CACTC AGCTGTTCCC ACGGGCCAGC CCCCAGCCTG 900 

CGCACCTGCT GGAGAAGATG CTGGAGCTAG AC GT GGAC A A GCGCCTGACG GCCGCGCAGG 960 

CCCTCACCCA TCCCTTCTTT GAACCCTTCC GGGACCCTGA GGAAGAGACG GAGGCCCAGC 1020 

AGCCGTTTGA TGATTCCTTA GAACACGAGA AACTCACAGT GGATGAATGG AAGCAGCACA 1080 

TCTACAAGGA GATTGTGAAC TTCAGCCCCA TTGCCCGGAA GGACTCACGG CGCCGGAGTG 1140 

GCATGAAGCT GTAGGGA C TC ATCTTGCATG GCACCGCCGG CCAGACACTG CCCAAGGACC 1200 

AGTATTTGTC ACTACCAAAC TCAGCCCTTC TTGGAATACA GCCTTTCAAG CAGAGGACAG 1260 

AAGGGTCCTT CTCCTTATGT GGGAAATGGG CCTAGTAGAT GCAGAATTCA AAGATGTCGG 1320 

TTGGGAGAAA CTAGCTCTGA T CCTAACAGG C C AC GT T A AA CTGCCCATCT GGAGAAT CG C 1380 

CTGCAGGTGG GGCCCTTTCC TTCCCGCCAG ACTGGGGCTG AG TGGG CGCT GAGCCAGGCC 1440 

GGGGGCCTAT GGCAGTGATG CTGTGTTGGT TTCCTAGGGA TGCTCTAACG AATTACCACA 1500 

AACCTGGTGG AT TG A A AC AG CAGAACTTGA TTCCCTTACA GTTCTGGAGG CTGGA A AT CT 1560 
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GGGATGCAGG 


TGTTGGCAGG 


GCTGTGGTCC 


CTTTGAAGGC 


T C TGGGGA AG 


AATCCTTCCT 


16 2 0 


TGGCTCTTTT 


TAGCTTGTGG 


CGGCAGTGGG 


C AGTCCGTGG 


CATTCCCCAG 


CTTATTGCTG 


16 8 0 


CATCACTCCA 


GTCTCTGTCT 


CTT CTGTTCT 


CTCCTCTTTT 


AACAACAGTC 


ATTGGAT TT A 


17 4 0 


GGGCCCACCC 


TAATCCTGTG 


TGATCTTATC 


TTGATCC T T A 


TTAATTAAAC 


CTGCAAATAC 


1 8 0 0 


TCTAGTTCCA 


AATAAAGTCA 


CATTCTCAGG 


T AAAAAAAAA 


AAAAAAAAAA 


A 


18 5 1 



We claim: 

1. A purified polynucleotide having a nucleic acid 
sequence selected from the group consisting of SEQ ID 
N0:1, SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO: 4, SEQ 
ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ 
ID NO:9, SEQ ID NO:10, SEQ ID NO:ll, SEQ ID NO:12, 
SEQ ID N0:13, SEQ ID NO:14, SEQ ID NO:15, SEQ ID 
NO:16, SEQ ID NO:17, SEQ ID NO:18, SEQ ID NO:19, 
SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID 
NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, 
SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, SEQ ID 
NO:30, SEQ ID NO:31, SEQ ID NO:32, SEQ ID NO:33, 
SEQ ID NO:34, SEQ ID NO:35, SEQ ID NO:36, SEQ ID 



NO:37, SEQ ID NO:38, SEQ ID NO:39, SEQ ID NO:40, 
SEQ ID NO:41, SEQ ID NO:42, SEQ ID NO:43, and SEQ 
ID NO:44. 

15 2. An expression vector comprising the polynucleotide of 
claim 1. 

3. A host cell transformed with the expression vector of 
claim 2. 

4. A method for producing and purifying a polypeptide, 
20 said method comprising the steps of: 

a) culturing the host cell of claim 3 under conditions 
suitable for the expression of the peptide; and 

b) recovering the polypeptide from the host cell culture. 

***** 
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SECRETED PROTEINS AND 
POLYNUCLEOTIDES ENCODING THEM 

FIELD OF THE INVENTION 

The present invention provides novel polynucleotides and 3 
proteins encoded by such polynucleotides, along with 
therapeutic, diagnostic and research utilities for these poly- 
nucleotides and proteins. 



BACKGROUND OF THE INVENTION 
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Technology aimed at the discovery of protein factors 
(including e.g., cytokines, such as lymphokines, interferons, 
CSFs and interleukins) has matured rapidly over the past 
decade. The now routine hybridization cloning and expres- 
sion cloning techniques clone novel polynucleotides 
"directly" in the sense that they rely on information directly 
related to the discovered protein (i.a, partial DNA/amino 
acid sequence of the protein in the case of hybridization 
cloning; activity of the protein in the case of expression 
cloning). More recent "indirect" cloning techniques such as 
signal sequence cloning, which isolates DNA sequences 
based on the presence of a now well-recognized secretory 
leader sequence motif, as well as various PCR-based or low 
stringency hybridization cloning techniques, have advanced 
the state of the art by making available large numbers of 
DNA/amino acid sequences for proteins that are known to 
have biological activity by virtue of their secreted nature in 
the case of leader sequence cloning, or by virtue of the cell 
or tissue source in the case of PCR-based techniques. It is to ^ 
these proteins and the polynucleotides encoding them that 
the present invention is directed. 

SUMMARY OF THE INVENTION 

In one embodiment, the present invention provides a 35 
composition comprising an isolated polynucleotide selected 
from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence 
of SEQIDNO:l; 

(b) a polynucleotide comprising the nucleotide sequence 40 
of SEQ ID NO:l from nucleotide 247 to nucleotide 
432; 

(c) a polynucleotide comprising the nucleotide sequence 
of SEQ ID NO:l from nucleotide 328 to nucleotide 
432; 45 

(d) a polynucleotide comprising the nucleotide sequence 
of the full length protein coding sequence of clone 
BD372__5 deposited under accession number ATCC 
98146; 

(e) a polynucleotide encoding the full length protein 50 
encoded by the cDNAinsert of clone BD372_J depos- 
ited under accession number ATCC 98146; 

(f) a polynucleotide comprising the nucleotide sequence 
of the mature protein coding sequence of clone ^ 
BD372.J deposited under accession number ATCC 
98146; 

(g) a polynucleotide encoding the mature protein encoded 
by the cDNA insert of clone BD372__ 5 deposited under 
accession number ATCC 98146; m 

(h) a polynucleotide encoding a protein comprising the 
amino acid sequence of SEQ ID NO:2; 

(i) a polynucleotide encoding a protein comprising a 
fragment of the amino add sequence of SEQ ID N02 
having biological activity; 65 

(j) a polynucleotide which is an allelic variant of a 
polynucleotide of (aHg) above; 



(k) a polynucleotide which encodes a species homologue 
of the protein of (h) or (i) above. 

Preferably, such polynucleotide comprises the nucleotide 
sequence of SEQ ID NO:l from nucleotide 247 to nucle- 
otide 432; the nucleotide sequence of SEQ ID NO:l from 
nucleotide 328 to nucleotide 432; the nucleotide sequence of 
the full length protein coding sequence of clone BD372 -h5 
deposited under accession number ATCC 98146; or the 
nucleotide sequence of the mature protein coding sequence 
of clone BD372_5 deposited under accession number 
ATCC 98146. In other preferred embodiments, the poly- 
nucleotide encodes the full length or mature protein encoded 
by the cDNA insert of clone BD372_J deposited under 
accession number ATCC 98146. 

Other embodiments provide the gene corresponding to the 
cDNA sequence of SEQ ID NO:l or SEQ ID N03. 

In other embodiments, the present invention provides a 
composition comprising a protein, wherein said protein 
comprises an amino acid sequence selected from the group 
consisting of: 

(a) the amino acid sequence of SEQ ID N02; 

(b) fragments of the amino acid sequence of SEQ ID 
NO:2; and 

(c) the amino acid sequence encoded by the cDNAinsert 
of clone 

BD372_5 deposited under accession number ATCC 
98146; the protein being substantially free from other mam- 
malian proteins. Preferably such protein comprises the 
amino acid sequence of SEQ ID NO:2. 

In one embodiment, the present invention provides a 
composition comprising an isolated polynucleotide selected 
from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence 
ofSEQIDNO:4; 

(b) a polynucleotide comprising the nucleotide sequence 
of SEQ ID NO:4 from nucleotide 316 to nucleotide 
501; 

(c) a polynucleotide comprising the nucleotide sequence 
of the full length protein, coding sequence of clone 
BR533_4 deposited under accession number ATCC 
98146; 

(d) a polynucleotide encoding the full length protein 
encoded by the cDNAinsert of clone BR533_4 depos- 
ited under accession number ATCC 98146; 

(e) a polynucleotide comprising the nucleotide sequence 
of the mature protein coding sequence of clone 
BR533_4 deposited under accession number ATCC 
98146; 

(f) a polynucleotide encoding the mature protein encoded 
by the cDNAinsert of clone BR533_4 deposited under 
accession number ATCC 98146; 

(g) a polynucleotide encoding a protein comprising the 
amino acid sequence of SEQ ID NO:5; 

(h) a polynucleotide encoding a protein comprising a 
fragment of the amino acid sequence of SEQ ID NO:5 
having biological activity; 

(i) a polynucleotide which is an allelic variant of a 
polynucleotide of (a)-(d) above; 

(j) a polynucleotide which encodes a species homologue 
of the protein of (g) or (h) above. 

Preferably, such polynucleotide comprises the nucleotide 
sequence of SEQ ID NO:4 from nucleotide 316 to nucle- 
otide 501; the nucleotide sequence of the full length protein 
coding sequence of clone BR533_4 deposited under acces- 
sion number ATCC 98146; or the nucleotide sequence of the 
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mature protein coding sequence of clone BR533__4 depos- 
ited under accession number ATCC 98146. In other pre- 
ferred embodiments, the polynucleotide encodes the full 
length or mature protein encoded by the cDNA insert of 
clone BR533_4 deposited under accession number ATCC 
98146. 

Other embodiments provide the gene corresponding to the 
cDNA sequence of SEQ ID NO:4 or SEQ ID NO:6. 

In other embodiments, the present invention provides a 
composition comprising a protein, wherein said protein 
comprises an amino acid sequence selected from the group 
consisting of: 

(a) the amino add sequence of SEQ ID NO:5; 

(b) fragments of the amino acid sequence of SEQ ID 
NO:5; and 

(c) the amino add sequence encoded by the cDNA insert 
of clone 

BR533_4 deposited under accession number ATCC 
98146; the protein being substantially free from other mam- 
malian proteins. Preferably such protein comprises the 
amino acid sequence of SEQ ID NO:5. 

In one embodiment, the present invention provides a 
composition comprising an isolated polynudeotide selected 
from the group consisting of: 

(a) a polynucleotide comprising the nucleotide sequence 
of SEQIDNO:7; 

(b) a polynudeotide comprising the nucleotide sequence 
of SEQ ID NO:7 from nudeotide 113 to nucleotide 
433; 

(c) a polynucleotide comprising the nucleotide sequence 
of the full length protein coding sequence of done 
CC288_9 deposited under accession number ATCC 
98146; 

(d) a polynucleotide encoding the full length protein 
encoded by the cDNA insert of clone CC288_9 depos- 
ited under accession number ATCC 98146; 

(e) a polynucleotide comprising the nucleotide sequence 
of the mature protein coding sequence of clone 
CC288_9 deposited under accession number ATCC 
98146; 

(f) a polynucleotide encoding the mature protdn encoded 
by the cDNAinsert of done CC288_9 deposited under 
accession number ATCC 98146; 

(g) a polynucleotide encoding a protein comprising the 
amino add sequence of SEQ ID NO:8; 

(h) a polynucleotide encoding a protein comprising a 
fragment of the amino add sequence of SEQ ID NO:8 
having biological activity; 

(i) a polynucleotide which is an allelic variant of a 
polynudeotide of (a)-(d() above; 

(j) a polynudeotide which encodes a spedes homologue 
of the protein of (g) or (h) above. 

Preferably, such polynudeotide comprises the nucleotide 
sequence of SEQ D NO:7 from nudeotide 113 to nudeotide 
433; the nudeotide sequence of the full length protein 
coding sequence of done CC288_9 deposited under acces- 
sion number ATCC 98164; or the nudeotide sequence of the 
mature protein coding sequence of clone CC288_9 depos- 
ited under accession number ATCC 98146. In other pre- 
ferred embodiments, the polynudeotide encodes the full 
length or mature protein encoded by the cDNA insert of 
done CC288_9 deposited under accession number ATCC 
98146. In yet other preferred embodiments, the present 
invention provides a polynudeotide encoding a protein 
comprising the amino add sequence of SEQ ID NO:8 from 
amino acid 1 to amino add 77. 
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Other embodiments provide the gene corresponding to the 
cDNA sequence of SEQ ID NO:7. 

In other embodiments, the present invention provides a 
composition comprising a protein, wherein said protein 
5 comprises an amino acid sequence selected from the group 
consisting of: 
(a) the amino add sequence of SEQ ID NO:8; 
the amino add sequence of SEQ ID NO:8 from amino 
acid 1 to amino add 77; 
10 (c) fragments of the amino add sequence of SEQ ID 
NO:8; and 

(d) the amino acid sequence encoded by the cDNAinsert 
of clone 

15 CC288__9 deposited under accession number ATCC 
98146; the protein being substantially free from other mam- 
malian proteins. Preferably such protein comprises the 
amino add sequence of SEQ ID NO:8 or the amino add 
sequence of SEQ ID NO:8 from amino add 1 to amino add 

20 77 ' 

In certain preferred embodiments, the polynudeotide is 
operably linked to an expression control sequence. The 
invention also provides a host cell, including bacterial, 
yeast, insect and mammalian cells, transformed with such 
25 polynudeotide compositions. 

Processes are also provided for produring a protein, 
which comprise: 

(a) growing a culture of the host cell transformed with 
such polynucleotide compositions in a suitable culture 

30 medium; and 

(b) purifying the protein from the culture. 

The protein produced according to such methods is also 
provided by the present invention. Preferred embodiments 
indude those in which the protein produced by such process 

35 is a mature form of the protein. 

Protein compositions of the present invention may further 
comprise a pharmaceutically acceptable carrier. Composi- 
tions comprising an antibody which specifically reacts with 
such protein are also provided by the present invention. 

40 Methods are also provided for preventing, treating or 
ameliorating a medical condition which comprises admin- 
istering to a niammalian subject a therapeutically effective 
amount of a composition comprising a protein of the present 
invention and a pharmaceutically acceptable carrier. 

43 DETAILED DESCRIPTION 

ISOLATED PROTEINS AND 
POLYNUCLEOTIDES 

50 Nucleotide and amino add sequences are reported bdow 
for each done and protein disdosed in the present applica- 
tion. In some instances the sequences are preliminary and 
may indude some incorrect or ambiguous bases or amino 
adds. The actual nucleotide sequence of each done can 

55 readily be determined by sequencing of the deposited done 
in accordance with known methods. The predicted amino 
add sequence (both full length and mature) can then be 
determined from such nudeotide sequence. The amino add 
sequence of the protdn encoded by a particular clone can 

6o also be determined by expression of the done in a suitable 
host cell, collecting the protein and determining its 
sequence. 

For each disdosed protein applicants have identified what 
they have determined to be the reading frame best identifi- 
es able with sequence information available at the time of 
filing. Because of the partial ambiguity in reported sequence 
information, reported protdn sequences indude * t Xaa w des- 
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ignators. These "Xaa" designators indicate either (1) a 
residue which cannot be identified because of nucleotide 
sequence ambiguity or (2) a stop codon in the determined 
nucleotide sequence where applicants believe one should not 
exist (if the nucleotide sequence were determined more 
accurately). 

As used herein a "secreted" protein is one which, when 
expressed in a suitable host cell, is transported across or 
through a membrane, including transport as a result of signal 
sequences in its amino acid sequence. "Secreted" proteins 
include without limitation proteins secreted wholly (e.g., 
soluble proteins) or partially (e.g., receptors) from the cell in 
which they are expressed. "Secreted" proteins also include 
without limitation proteins which are transported across the 
membrane of the endoplasmic reticulum. 
Clone "BD372 5 M 

A polynucleotide of the present invention has been iden- 
tified as clone <C BD372_5". BD372.J was isolated from a 
human fetal kidney cDNA library using methods which are 
selective for cDNAs encoding secreted proteins. BD372__5 
is a full-length clone, including the entire coding sequence 
of a secreted protein (also referred to herein as **BD372__ J 5 
protein"). 

The nucleotide sequence of the 5' portion of BD372_5 as 
presently determined is reported in SEQ ID NO:l. What 
applicants presently believe is the proper reading frame for 
the coding region is indicated in SEQ ID NO:2. The pre- 
dicted acid sequence of the BD372_^5 protein corresponding 
to the foregoing nucleotide sequence is reported in SEQ ID 
NO:2. Amino acids 1 to 27 are the predicted leader/signal 
sequence, with the predicted mature amino acid sequence 
beginning at amino acid 28. Additional nucleotide sequence 
from the 3' portion of BD372.J, including the poly A tail, is 
reported in SEQ ID NO:3. 

The EcoRI/NotI restriction fragment obtainable from the 
deposit containing clone BD372_5 should be approxi- 
mately 2300 bp. 

The nucleotide sequence disclosed herein for BD372_S 
was searched against the GenBank database using BLASTA/ 
BLASTX and FASTA search protocols. BD372_5 demon- 
strated at least some identity with ESTs identified as 
i4 yc90f 12.s 1 Homo sapiens cDNA clone 23278 3 m (R39276, 
BlastN) and "EST05537 Homo sapiens cDNA clone 
HFBEM26" (T07647, Fasta). Based upon identity, 
BD372_J proteins and each identical protein or peptide 
may share at least some activity. 
Clone "BR533 4" 

A polynucleotide of the present invention has been iden- 
tified as clone "BR533„4". BR533_4 was isolated from a 
human fetal kidney cDN A library using methods which are 
selective for cDNAs encoding secreted proteins. BR533_4 
is a full-length clone, including the entire coding sequence 
of a secreted protein (also referred to herein as "BR533_4 
protein"). 

The nucleotide sequence of the 5' portion of BR533_4 as 
presently determined is reported in SEQ ID NO:4. What 
applicants presently believe is the proper reading frame for 
the coding region is indicated in SEQ ID NO:5. The pre- 
dicted acid sequence of the BR533_4 protein corresponding 
to the foregoing nucleotide sequence is reported in SEQ ID 
NO:5. Additional nucleotide sequence from the 3 f portion of 
BR533„4, including the polyA tail, is reported in SEQ ID 
NO:6. 

The EcoRI/NotI restriction fragment obtainable from the 
deposit containing clone BR533„4 should be approximately 
2850 bp. 

The nucleotide sequence disclosed herein for BR533_4 
was searched against the GenBank database using BLASTA/ 
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BLASTX and FASTA search protocols. BR533_4 demon- 
strated at least some homology with murine semaphorin E 
(X85994, BlastN). BR533_4 also shows at least some 
identity with an EST identified as "yy80dl0.s 1 Homo 
5 sapiens cDNA clone 279859 3'" (N38844, BlastN). Based 
upon homology, BR533_4 proteins and each homologous 
protein or peptide may share at least some activity. 
Clone "CC288 9" 

A polynucleotide of the present invention has been iden- 
rifled as clone "CC288_9". CC288_9 was isolated from a 
human adult brain cDNA library using methods which are 
selective for cDNAs encoding secreted proteins. CC288_9 
is a full-length clone, including the entire coding sequence 
of a secreted protein (also referred to herein as "CC288_9 
protein"). 

15 The nucleotide sequence of CC288_9 as presentiy deter- 
mined is reported in SEQ ID NO:7. What applicants pres- 
ently believe to be the proper reading frame and the pre- 
dicted amino acid sequence of the CC288_9 protein 
corresponding to the foregoing nucleotide sequence is 

20 reported in SEQ ID NO:8. 

The nucleotide sequence disclosed herein for CC288_9 
was searched against the GenBank database using BLASTA/ 
BLASTX and FASTA search protocols. No hits were found 
in the database. 

25 Deposit of Clones 

Clones BD372_5, BR533_4 and CC288_9 were depos- 
ited on Aug. 22, 1996 with the American T^pe Culture 
Collection under accession number ATCC 98146, from 
which each clone comprising a particular polynucleotide is 
obtainable. Each clone has been transfected into separate 

30 bacterial cells (E.co/0 in this composite deposit Each clone 
can be removed from the vector in which it was deposited by 
performing an EcoRI/NotI digestion (5* cite, EcoRI; 3' cite, 
NotI) to produce the appropriately sized fragment far such 
clone (approximate clone size fragment are identified 

35 below). Bacterial cells containing a particular clone can be 
obtained from the composite deposit as follows: 

An oligonucleotide probe or probes should be designed to 
the sequence mat is known for that particular clone. This 
sequence can be derived from the sequences provided 

40 herein, or from a combination of those sequences. The 
sequence of the oligonucleotide probe that was used to 
isolate each full-length clone is identified below, and should 
be most reliable in isolating the clone of interest 
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Clone 


Probe Sequence 


BD372_-5 


SEQ ID NO: 9 


BR533_4 


SEQ ID NO: 10 


CC288_9 


SEQ ID NO: 11 



In the sequences listed above which include an N at position 
2, that position is occupied in preferred probes/primers by a 
biotinyiated phosphoaramidite residue rather than a nucle- 

55 otide (such as, for example, that produced by use of biotin 
phosphoramidite (l-dtoefooxytrityloxy-2-(N-biotinyl-4- 
anu^obutyl>propyl-3-0-(2<yanoetoyl^ 
phosphoramadite) (Glen Research, cat no. 10-1953)). 
The design of the oligonucleotide probe should preferably 

60 follow these parameters: 

(a) It should be designed to an area of the sequence which 
has the fewest ambiguous bases ("N's"), if any; 

(b) It should be designed to have a T m of approx. 80° C 
(assuming 2° for each A or T and 4 degrees far each G 

65 or Q. 

The oligonucleotide should preferably be labeled with g- 32 P 
ATP (specific activity 6000 O/mmole) and T4 polynucle- 



5,654,173 

7 8 

otide kinase using commonly employed techniques for clone. The rnaturefonn of ^P^^^^e 
labeline oUaonucleotides. Other labeling techniques can expression of the disclosed fuU-lengto polynucleotide 
a£ be 8 usef uSnco^ ated label should preferably be (preferably those deposited witoATCC) m a smtoble mam- 
removed by gel mtration chromatography or other estab- malian cell or other host cell. The sequence of the mature 
™me2oK 5 fonuofmeproteinn^yakobedetenmna 

into the orobe should be quantitated by measurement in a acid sequence of the full-length torm. 
SSatton count Preferably, specific activity of me The present invent** .also provider ;gen« ; ~rrespond|ng 
resulting probe should be approximately 4e-K> drnp/pmole. to the cDNA sequences disclosed herein. The corresponding 
SS Sture cXnng toe'pool of full-length genes can be isolated in accordance ™* ^wn metocds 
clones should preferably be thawed and 100 ul of the stock 10 usmgmesequenceMcrmaUon oisclosedhercm. Suchmeto- 
used to inoculate a sterile culture flask containing 25 ml of ods include the preparation of probes oi : primers from toe 
sterile L-broth containing ampicOlin at 100 ug/mL The disclosed sequence information for identification and/or 
culture should preferably be^wn to saturation at 37° C, amplification of genes in appropriate genomic libraries or 
and the saturated culture should preferably be diluted in other sources of genomic materials, 
fresh L-broth. Aliquots of these dilutions should preferably 15 Where the protein of the present invention is membrane- 
be plated to determine the dilution and volume which will bound (e.g., is a receptor), the present invention also pro- 
yield approximately 5000 distinct and well-separated colo- vides for soluble forms of such protein In such forms part 
Ses on soUdtacteriological media containing L-broth con- or all of the intracellular and transmembrane domains of toe 
Sg amplcSin at 100 ug/ml and agar at L* in a 150 protein are deleted such that the protein 
mmuetri teh when grown overnight at 37° C. Other known 20 from the cell in which it is expressed. The mtracellular and 
SSTSSffSbA well-separated colonies can transmembrane domains of proteins of^ invention canbe 
also be employed, identified in accordance with known techniques for deter- 

Standard colony hybridization procedures should then be ruination of such domains from sequence information. 
usSanstato^ Species homologs of toe disclosed ^"^sand 

denature and bake them. ™ P"*eins ^ also provided by the present invention. Species 

^^^^bc^UfF.^lhc, homologsmaybeisolated^diden^^ 
with gentle agitation in 6x SSC (20* stock is 175.3 g probes or primers from the sequences I™^*"^ 
NaCllter, 8&2 g Na citrate/liter, adjusted to pH 7.0 with screening a suitable nucleic add source from the desired 
NaOH) containing 0.5% SDS, 100 ug/ml of yeast RNA, and species. „ ari! ,„ t « „ f 

10 mM EDTA (approximately 10 mL per 150 mm filter). 30 The invention also encompasses allelic variants of toe 
^^^^TS^MM^^ disclosed polynucleotides orproteins; that is naturaUy- 
at a ~Sn?atton greater than or equal to le*> dpm/mL. occurring alternative forms o toe isolated hpogn^dn 
The filter is then preferably incubated at 65' C with gentle which also encode proteins which are identical homologous 
agitation overnight The filter is then preferably washed in or related to that encoded by the F^ndeottdM. 
500 mL of 2x SSC/0.5% SDS at room temperature without 35 The isolated polynucleotide of the invention may be 
agitation, preferably followed by 500 mL of 2x SSC/0.1% operably linked to an expression confrol sequence such as 
SM at Zm temperature wito gentle shaking for 15 min- the P MT2 or pED 

utes. Athird washVdth O.lxSSC/0.5% SDS at 65» C. for 30 et al., Nucleic Acids Res. 19 4485-W90 (l»«^"*«to 
minutes to 1 hour is optional. The filter is then preferably produce toe protein recombinantty. ^J^f^ 
dried and subjected to autoradiography for sufficient time to 40 sion control sequences are known in the ^ <f n °^™: 
visualize the positives on the X-ray film. Other known ods of expressing recombinant proteins are also known and 
Ration meSca^ also be employed. are exemplified in R. Kaufman Method, in B^togy 

y Se positive colonies are pickedTgTown in culture, and 185, 537-566 (1990). As defined herein operably Imtad 
plasmid DNA isolated using standard procedures. The means that the ^f^ c ^°l^^°^ 
clones can then be verified by restriction analysis, hybrid- 45 an expression control sequence are seated within a vector 
Lion analysis, or DNA sequencing. or cell in such a way that the proton is exposed by a host 

Fragment of the proteinic toe Resent invention which cell which has been transformed (transfected) with the 
are capable of exhibLg biological activity are also encom- ligated polynudcotide/expression control sequence, 
passed by the present Mention. Fragments of the protein A number of types of cells may act as suable hod t celk 
niaybeinUneifcfmortoeyroaybeVcli^usmgloiown 50 for expression of the protein. Marnmahan host ^".ckde. 
Sds for example, as described in H. U. Saragovi, et al., for example, monkey COS ceUs Ounese Hamster Ovary 
B^o^773-778(1992)andinR.S.McDowell, (CHO) cells human , Mdney 293 cel£ ^n^.denna. 
et al., I. Amer Chern. Soc. 114, 9245-9253 (1992), both of A431 cells human Colo205 cells 3T3^, CM ceU , 
which are incorporated herein by reference. Such fragments other transformed primate cell lines ^ d 
may be fused loonier molecules such as immunoglobulins 55 cell strains derived from in , vitro culture £ P™* ™£ 
for many purposes, including increasing the valency of primary explants, HeLa cells, mouse L cells, BHK, HL-60, 
protein binding sites. For example, fragments of toe protein U937, HaK or Jurkat cells. 

may be fused through "linked sequences to the Fc portion Alternatively, it may be possible to producei toe proteuiin 
L taununoglobufl For a bivalent form of the protein, lower eukaryotes such as yeast or in prokaryotes such as 
sLafusioncouldbetotoeFcportionofanlgGmolecule. . bacteria. PotentiaUy suitable yeast 
Other immunoglobulin isotypes may also be used to gener- romyces cerevtstae, Se^Meehen^eu pamH. 
SesuchfusionfForexamp^FOtein-IgMfusionwould ^^^^^^^S^ 
generate a decavalent form of the protein of the invention. of expressing heterologous proteins. Motujr nttte 
The present invention also provides both full-length and bacterial strains include Esckencka cok. Bacdkujtitto 
JLlm^^mM^d^Ttatmaighfom « Salmonella typhimurium, or any bactoal sttajn « pabk of 
of the such proteins is identified in toe sequence listing by expressing heterologous proteins. If the protein is made j 
taxation of the nucleotide sequence of each disclosed yeast or bacteria, it may be necessary to modify toe protein 
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produced therein, for example by phosphorylation or gly- compounds and in immunological processes for the devel- 
cosvlation of the appropriate sites, in order to obtain the opment of antibodies. 

ftSS protek^K covalent attachments may be The proteins provided herein also include protein, i char- 
acconipUshedudngknowncheim^alorenzymaticmethods. acterized by annuo acid sequences similar to fcose of 

ThTproteminayalsobeproducedbyoperablylMmgthe 5 purified proteins but mto which modification are naturally 
isolated polynucleotide of the invention to suitable control provided or deliberately engineered. For example, modifi- 
sequences in one or more insect expression vectors, and cations in the peptide or DNA sequences can be made by 
employing an insect expression system! Materials and meth- those skilled in the art using known techniques. Modifica- 
odTfo? baculovirus/insect cell expression systems are com- tkms of interest in the protein sequences may inc tade the 
merdaUy available in kit form from. e.g., Invitrogen. San 10 alteration, substitution, replacement, insertion or deletion of 
Dieao Calif , U.S.A. (the MaxBat® Mt), and such methods a selected amino acid residue in the coding sequence. For 
are well known in the art, as described in Summers and example, one or more of the cysteine residues may be 
Smith Texas Agricultural Experiment Station Bulletin No. deleted or replaced with another amino aad to alter the 
VM(wX%S* ^rein by referenc*. As used conformation of the molecule. Techniques for such 
herein an insect ceUcapable of expressing a polynucleotide 15 alteration, substitution, replacement, insertion or deleUon 
of me present invention is "transformed." are well known to those skilled in the art (see, e,g.,U.S. Pat 

Tne&7S^ No. 4,518,584). Preferably such «^*«*«; 

transformed host cells under culture conditions suitable to replacement, insertion or deletion retains the desired activity 

express the recombinant protein. The resulting expressed of the protein. 

protein may then be purified from such culture (Lc, from 20 Other fragments and derivatives of the fences _ of 
Silture medium or cell extracts) using known purification proteins which would be expected to retain protein activity 
processes, such as gel filtration and ion exchange chroma- in whole or in part and may ttius be useful for screening or 
tography. The purification of the protein may also include an other immunological methodologies may also be easily 
affiX column containing agents which will bind to the madebymosesldUedinmeartgivcmmechsdc^ureshttein. 
protein; one or more column steps over such affinity resins 25 Such modifications are believed to be encompassed by the 
as concanavalin A-agarose, heparin-toyopearl® or Gbac- present invention. 

rom blue 3GA Sepharose®; one or more steps involving USES bi oloGICAL ACTIVITY 

hydrophobic interaction chromatography using such reans „♦:„„„♦,•„„ 
as phenyl ether, butyl ether, or propyl ether, or immunoaf- The polynucleotides and proteins of fte I~t>™*°? 
finity chromatography. 30 are expected to exhibit one or more of tiieuses or biological 

Alternativdy the protein of the invention may also be activities (including those associated with assays cited 
expressed in a form which will facilitate purification. For herein) identified below. Uses or a f ^ e ^.^M° r 
example, it may be expressed as a fusion protein, such as proteins of Represent inventionmay be ^provided by admln- 
thosTof maltose binding protein (MBP), ghitathione-S- istration or use of such proteins or by administration or use 
transferase (GST) or thioredoxin (TRX). Kits for expression 35 of polynucleotides encoding such proteins (such as for 
and purification of such fusion proteins are commercially example, in gene therapies or vectors suitable for introduc- 
available from New England BioLab (Beverly, Mass.), Phar- tion of DNA). 
macia(Piscataway,NJ.) and In Vilrogen, respectively. The Research Uses and Utilities 

protein can also be tagged with an epitope and subsequently The polynucleotides provided by the present invention 
Purified by using a specific antibody directed to such 40 can be used by the research community for vanous purposes, 
epitope. One such epitope ("Flag") is commercially avail- The polynucleotides can be used to express -lenotanant 
ablefrom Kodak (Ifcw Haven, Conn.). protein for analysis, characterization or therapeutic use; as 

Finally, one or more reverse-phase high performance markers for tissues in which the corresponding protein is 
liquid chromatography (RP-HPLC) steps employing hydro- preferentially expressed (either consbtutively or at a par- 
phobic RP-HPLC media, e.g., silica gel having pendant 45 ticular stage of tissue differentiate or development or in 
methyl or other aliphatic groups, can be employed to further disease states); as molecular weight markers on Soufcern 
purify the protein.Some or all of theforegoing purification gels; as chromosome markers or tags (when labeled) .to 
steps, in various combinations, can also be employed to identify chromosomes or to map related gene positions; to 
provide a substantially homogeneous isolated recombinant compare with endogenous DNA sequences in patients to 
protein. The protein thus purified is substantially free of 50 identify potential genetic disorders; as probes to hybridize 
other mammatian proteins and is defined in accordance with and thus discover novel, related DNA sequences; as a source 
the present invention as an "isolated protein." of information to derive PCR prunas for genetic finger- 

The protein of the invention may also be expressed as a printing; as a probe to "subtract-out" known sequences in 
product of transgenic animals, e.g., as a component of the the process of discovering other novel polynucleotides; for 
milk of transgenic cows, goats, pigs, or sheep which are 55 selecting and making oUgomers for attachment to _a gen e 
characterized by somatic oi germ cells containing a nucle- chip" or other support, including for examination of expres- 
otide sequence encoding the protein. sion patterns; to raise anti-protein antibodies using DNA 

The protein may also be produced by known conventional immunization techniques; and as an antigen to raise anti- 
chemical synthesis. Methods for constructing the proteins of DNA antibodies or elicit another immune response. Where 
me present mvention by synthetic means are known to those » the polynucleotide encodes a protein which bmds or poten- 
skOled in the art The synthetically-constructed protein tiafly binds to another protein (such as, for example, in a 
sequences, by virtue of sharing primary, secondary or ter- receptor-ligand interaction), me polynucleotide can ako be 
tiary structural and/or conformational characteristics with used in interaction trap assays (such as for «ampk, that 
proteins may possess biological properties in common described in Gyuris et al., Cell 75:791-803 (1993)) » 
herewith, including protein activity. Thus, they may be «s identify polynucleotides encoding the ; other protein with 
employed as biologically active or immunological substi- which binding occurs or to identify inhibitors of the binding 
tutes for natural, purified proteins in screening of therapeutic interaction. 
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The proteins provided by the present invention can simi- nol. 145:1706-1712, 1990; Bertagnolli et al., Cellular 

larly be used in assay to determine biological activity, Immunology 133327-341, 1991; Bertagnolli, et al., L 

includinginapanelof midtiple proteins for high-throughput Immunol. 149:3778-3783, 1992; Bowman et al, L Immu- 

screening; to raise antibodies or to elicit another immune noL 152:1756-1761, 1994. 

response; as a reagent (including the labeled reagent) in 5 Assays for cytokine production and/or prohferation _of 

assays designed to quantitatively determine levels of the spleen cells, lymph node cells or thymocytes indude with- 

protein (or its receptor) in biological fluids; as markers for out limitation, those described in: Polyclonal T cell 

tissues in which the corresponding protein is preferentially stimulation, Kruisbeek, A. M. and Shevach, R M. lnCur- 

expressed (either constitutively or at a particular stage of rent Protocols in Immunology, J. E. e.a. Coligan eds. Vol l 

tissue differentiation or development or in a disease state); 10 pp. 3.12.1-3.12.14, John Wiley and Sons, Toronto 1994; 

and of course, to isolate correlative receptors or ligands. and Measurement of mouse and human mterleiuan y, 

Where the protein binds or potentially binds to another Schreiber, R. D. In Current Protocols in Immunology. J. E. 

protein (such as, for example, in a receptor-ligand e.a. Coligan eds. Vol 1pp. 6.8. 1-6.8.8 John Wiley and Sons, 

interaction), the protein can be used to identify the other Toronto. 1994. 

protein with which binding occurs or to identify inhibitors of 15 Assays for proliferation and differentiation of hematopoi- 

the binding interaction. Proteins involved in these binding etic and lymphopoietic cells include, without limitation, 

interactions can also be used to screen for peptide or small those described in: Measurement of Human and Murine 

molecule inhibitors or agonists of the binding interaction. Interleukin 2 and Interleukin 4, Bottomry, K.. Davis, L. S. 

Any or all of these research utilities are capable of being and Lipsky, P. E. In Current Protocols in Immunology. J. E. 

developed into reagent grade or kit format for commercial- 20 e.a. Coligan eds. Vol 1 pp. 6.3.1-6.3.12, John i Wiley and 

ization asresearchVoducts. Sons, Toronto. 1991; deVries et al., J. Exp Med. 

Methods for rooming the uses listed above are well 173:1205-1211, 1991; Moreau et al., Nature 336:690-69^, 

known to those skilled in the art References disclosing such 1988; Greenberger et al., Proa Natl Acad, Sa. U.5>.A. 

methods include without limitation "Molecular Cloning: A 80:2931-2938, 1983; Measurement of mouse and human 

Laboratory Manual", 2d ed\, Cold Spring Harbor Laboratory 25 interleukin 6— Nordan, R. In Current Protocols in lmrnu- 

Press, Sambrook, J., E. R Fritsch andT. Maniatis eds., 1989, nology. J. E. e.a. Coligan eds. Vol 1 pp. 6.6.1-6.6.5, John 

and 'Methods in Enzymology: Guide to Molecular Cloning Wiley and Sons, Toronto. 1991; Smith et al., Eroc. Natl. 

Techniques", Academic Itas, Berger, S. L. and A. R. Aced. Sci. U.S.A. 83:1857-1861, 1986; Measurement of 

Kimmel eds 1987 human Interleukin 11— Bennett, R, Giannom, J., Clark, 5. 

Nutritional Uses » C. and Turner, K. J. In Current Protocols in Immunology. J. 

Polynucleotides and proteins of the present invention can E. e.a. Coligan eds. Vol 1 pp. 6.15.1 John Wiley and Sons, 

also be used as nutritional sources or supplements. Such uses Toronto. 1991; Measurement of mouse and human Interleu- 

include without limitation use as a protein or amino acid kin 9— Ciarletta, A., Giannotti, J., dark, S. C. and Turner, 

supplement, use as a carbon source, use as a nitrogen source K. J. In Current Protocols in Immunology, J. E. e.a. Cohga° 

and use as a source of carbohydrate. In such cases the protein 35 eds. Vol 1 pp. 6.13. 1, John Wiley and Sons, Toronto. 1991. 

or polynucleotide of the invention can be added to the feed Assays for T-cell clone responses to antigens (which will 

of a particular organism or can be administered as a separate identify, among others, proteins that affect APC-T cell 

solid Tor liquid preparation, such as in the form of powder, interactions as well as direct T-cell effects by measuring 

pills solutions, suspensions or capsules. In the case of proliferation and cytokine production) include, without 

niicroorganisms, the protein or polynucleotide of the inven- 40 limitation, those described m: Current Protocols in 

tion can be added to the medium in or on which the Immunology, Ed by J. E. Coligan, A. M. Kxuisbee^ D. H. 

microorganism is cultured. Margulies, & M. Shevach, W Strober, PuK Greene Pubhsh- 

Cytokine and Cell Prolfferation/Differentiation Activity ing Associates and Wey-Intersaence (Chapter 3, In vitro 

A protein of the present invention may exhibit cytokine, assays for Mouse Lymphocyte Function; Chapter 6, Cytok- 

cell proliferation (either inducing or inhibiting) or cell 45 ines and their cellular receptors; Chapter 7, Immunologic 

differentiation (either inducing or iimibiting) activity or may studies in Humans); Weinberger et al., Proc. Nad. Acad. Sa. 

induce production of other cytokines in certain cell popu- USA 77:6091-6095, 1980; Weinberger et ^^'J^^^' 

lations. Many protein factors discovered to date, including 11:405-411, 1981;Takai et aL, J. ^^l 37 :^ 4 - 3500 ' 

all known cytokines, have exhibited activity in one or more 1986; Takai et al., J. Immunol. 140:508-512, 1988. 

factor dependent cell proliferation assays, and hence the 50 Immune Stimulating or Suppressing Activity 

assays serve as a convenient coiifirmation of cytokine activ- A protein of the present invention may also exhibit 

ity The activity of a protein of the present invention is immune stimulating or immune suppressing activity, includ- 

evidenced by any one of a number of routine factor depen- ing without limitation the activities for which assays are 

dent cell proliferation assays for cell lines including, without described herein. Aprotein may be useful in the treatment of 

limitation 32D,DA2, DAlG,T10,B9,B9/ll,BaF3,MC9/ 55 various immune deficiencies and disorders (mcluding severe 

G, M-KpreB M+), 2E8, RB5, DAI, 123, T1165, HT2, combined immunodeficiency (SOD)) ? e.g. t in regulating (up 

CTLL2 TF-1, Mo7e and CMK. or down) growth and proliferation of T and/or B 

The activity of a protein of the invention may, among lymphocytes, as well as effecting the cytolytic activity of 

other means, be measured by the following methods: NK cells and other cell populations. These mirnune defi- 

Assays for T-cell or thymocyte proliferation include with- 60 ciencies may be genetic or be caused by vital (e.g., HIV) as 

ut limitation those described in: Current Protocols in well as bacterial or fungal infections, or may result from 



Immunology, Ed by J. E Coligan, A. M. Kruisbeek, D. H. autoimmune disorders. More specifically, infectious dis- 

Margulies, E. M. Shevach. W. Strober, Pub. Greene Pub- eases causes by viral, bacterial, fungal or other infection 

lishing Associates and WUey-Interscience (Chapter 3, la may be treatable using a protein of the present invention, 

Vitro assays for Mouse Lymphocyte Function 3.1-3.19; 65 including infections by HIV, hepatitis viruses, herpesviruses, 

Chapter 7, Immunologic studies in Humans); Takai et al. J. mycobacteria, Leishmania spp., malaria spp. and vanous 

Immunol 137*3494-3500, 1986; Bertagnolli et aL, J. Immu- fungal infections such as candidiasis. Of course, m this 



5,6f 

13 

regard, a protein of the present invention may also be useful 
where a boost to the immune system generally may be 
desirable, i.e., in the treatment of cancer. 

Autoimmune disorders which may be treated using a 
protein of the present invention include, for example, con- 
nective tissue disease, multiple sclerosis, systemic lupus 
erythematosus, rheumatoid arthritis, autoimmune pulmo- 
nary inflammation, Guillain-Barre syndrome, autoimmune 
thyroiditis, insulin dependent diabetes mellitis, myasthenia 
gravis, graft-versus-host disease and autoimmune inflam- 
matory eye disease. Such a protein of the present invention 
may also to be useful in the treatment of allergic reactions 
and conditions, such as asthma (particularly allergic asthma) 
or other respiratory problems. Other conditions, in which 
immune suppression is desired (including, for example, 
organ transplantation), may also be treatable using a protein 
of the present invention. 

Using the proteins of the invention it may also be possible 
to immune responses, in a number of ways. Down regulation 
may be in the form of inhibiting or blocking an immune 
response already in progress or may involve preventing the 
induction of an immune response. The functions of activated 
T cells may be inhibited by suppressing T cell responses or 
by inducing specific tolerance in T cells, or both. Immuno- 
suppression of T cell responses is generally an active, 
non-antigen-specific, process which requires continuous 
exposure of the T cells to the suppressive agent Tolerance, 
which involves inducing non-responsiveness or anergy in T 
cells, is distinguishable from immunosuppression in that it is 
generally antigen-specific and persists after exposure to the 
tolerizing agent has ceased. Operationally, tolerance can be 
demonstrated by the lack of a T cell response upon reexpo- 
sure to specific antigen in the absence of the tolerizing agent 

Down regulating or preventing one or more antigen 
functions (including without limitation B lymphocyte anti- 
gen functions (such as, for example, B7)), eg., preventing 
high level lymphokine synthesis by activated T cells, will be 
useful in situations of tissue, skin and organ transplantation 
and in graft-versus-host disease (GVHD). For example, 
blockage of T cell function should result in reduced tissue 
destruction in tissue transplantation. Typically, in tissue 
transplants, rejection of the transplant is initiated through its 
recognition as foreign by T cells, followed by an immune 
reaction that destroys the transplant. The administration of a 
molecule which inhibits or blocks interaction of a B7 
lymphocyte antigen with its natural ligand(s) on immune 
cells (such as a soluble, monomelic form of a peptide having 
B7-2 activity alone or in conjunction with a monomelic 
form of a peptide having an activity of another B lympho- 
cyte antigen (e.g., B7-1, B7-3) or blocking antibody), prior 
to transplantation can lead to the binding of the molecule to 
the natural ligand(s) on the immune cells without transmit- 
ting the corresponding costimulatory signal Blocking B 
lymphocyte antigen function in this matter prevents cytokine 
synthesis by immune cells, such as T cells, and thus acts as 
an immunosuppressant. Moreover, the lack of costimulation 
may also be sufficient to anergize the T cells, thereby 
inducing tolerance in a subject Induction of long-term 
tolerance by B lymphocyte antigen-blocking reagents may 
avoid the necessity of repeated administration of these 
blocking reagents. To achieve sufficient immunosuppression 
or tolerance in a subject, it may also be necessary to block 
the function of a combination of B lymphocyte antigens. 

The efficacy of particular blocking reagents in preventing 
organ transplant rejection or GVHD can be assessed using 
animal models that are predictive of efficacy in humans. 
Examples of appropriate systems which can be used include 
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allogeneic cardiac grafts in rats and xenogeneic pancreatic 
islet cell grafts in mice, both of which have been used to 
examine the immunosuppressive effects of CTLA4Ig fusion 
proteins in vivo as described in Lenschow et aL, Science 

5 257:789-792 (1992) and Turka et al., Proc. Natl. Acad. Sd 
USA, 89:11102-11105 (1992). In addition, murine models 
of GVHD (see Paul ed, Fundamental Inununology, Raven 
Press, New York, 1989, pp. 846-847) can be used to 
determine the effect of blocking B lymphocyte antigen 

io function in vivo on the development of that disease. 

Blocking antigen function may also be therapeutically 
useful far treating autoimmune diseases. Many autoimmune 
disorders are the result of inappropriate activation of T cells 
that are reactive against self tissue and which promote the 

15 production of cytokines and autoantibodies involved in the 
pathology of the diseases. Preventing the activation of 
autoreactive T cells may reduce or eliminate disease symp- 
toms. Administration of reagents which block costimulation 
of T cells by disrupting receptorJigand interactions of B 

20 lymphocyte antigens can be used to inhibit T cell activation 
and prevent production of autoantibodies or T cell-derived 
cytokines which may be involved in the disease process. 
Additionally, blocking reagents may induce antigen-specific 
tolerance of autoreactive T cells which could lead to long- 

25 term relief from the disease. The efficacy of blocking 
reagents in preventing or alleviating autoimmune disorders 
can be determined using a number of well-characterized 
animal models of human autoimmune diseases. Examples 
include murine experimental autoimmune encephalitis, sys- 

30 temic lupus erythmatosis in MRI/lpr/lpr mice or NZB 
hybrid mice, murine autoimmune collagen arthritis, diabetes 
mellitus in NOD mice and BB rats , and murine experimental 
myasthenia gravis (see Paul ed., Fundamental Immunology, 
Raven Press, New York, 1989, pp. 840-856). 

35 Upregulation of an antigen function (preferably a B 
lymphocyte antigen function), as a means of up regulating 
immu ne responses, may also be useful in therapy. Upregu- 
lation of immune responses may be in die form of enhancing 
an existing immune response or eliciting an initial immune 

40 response. For example, enhancing an immune response 
through stimulating B lymphocyte antigen function may be 
useful in cases of viral infection. In addition, systemic viral 
diseases such as influenza, the common cold, and encepha- 
litis imght be alleviated by the administration of stimulatory 

45 forms of B lymphocyte antigens systemically. 

Alternatively, anti-vital immune responses may be 
enhanced in an infected patient by removing T cells from the 
patient, costimulating the T cells in vitro with viral antigen- 
pulsed APCs either expressing a peptide of the present 

50 invention or together with a stimulatory form of a soluble 
peptide of the present invention and reintroducing the in 
vitro activated T cells into the patient Another method of 
enhancing anti-viral immune responses would be to isolate 
infected cells from a patient, transfect them with a nucleic 

55 acid encoding a protein of the present invention as described 
herein such that the cells express all or a portion of the 
protein on their surface, and reintroduce the transfected cells 
into the patient The infected cells would now be capable of 
delivering a costimulatory signal to, and thereby activate, T 

60 cells in vivo. : 

In another application, up regulation or enhancement of 
antigen function (preferably B lymphocyte antigen function) 
may be useful in the induction of tumor immunity. Tumor 
cells (e.g., sarcoma, melanoma, lymphoma, leukemia, 

65 neuroblastoma, carcinoma) transfected with a nucleic acid 
encoding at least one peptide of the present invention can be 
administered to a subject to overcome tumor-specific toler- 
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ance in the subject If desired, the tumor cell can be Mixed lymphocyte reaction (MLR) assays (which will 

transfected to express a combination of peptides. For identify, among others, proteins that generate predominantty 

example, tumor cells obtained from a patient can be trans- Thl and CTL responses) include, without hmitabon, those 

fected ex vivo with an expression vector directing the described in: Current Protocols ^^^^f^l^ 

expression of apeptidehavingB7-2-likeactivity alone, orin 5 Coligan,A.M. Kruisbeek, D. H. Margulies, E. M. Shevach, 

conjunction with a peptide having B7-l-like activity and/or W. Strober. Pub. Greene Publishing Associates and Wiley- 

B7-3-like activity. The transfected tumor cells are returned Intersdence (Chapter 3, In Vitro assays for Mouse Lympho- 

to the patient to result in expression of the peptides on the cyte Function 3.1-3.19; Chapter 7 Im^o^c smdies m 

surface of the transfected cell. Alternatively, gene therapy Humans); Takai et al., J. Immunol. 137 3494-3500, 1986 

techniques can be used to target a tumor cell for transfection 10 Takai et al., J. Immunol. 140:508-512, 1988; Bertagnolli et 

jn^vo al., J. Immunol. 1493778-3783, 1992. 

The 'presence of the peptide of the present invention Dendritic cell-dependent assays (which will identify, 

having the activity of a B lymphocyte antigen(s) on the among others, proteins expressed by dentotic cells that 

surface of the tumor cell provides the necessary costimula- activate naive T-cells) include, withc^t lirmubon, those 

tion sienal to T cells to induce a T cell mediated immune 15 described in: Guery et aL, J. ImmunoL 134:536-544, 1995, 

response against the transfected tumor cells. In addition, Inaba et al., Journal of Experimental Medicine 

tumor cells which lack MHC class I or MHC class II 173:549-559, 1991; Macatonia et al , Journal of Immunol- 

molecules. or which fail to reexpress sufficient mounts of ogy 154:5071-5079, 1995; P^ador et al Journal of 

MHC class I or MHC class n molecules, can be transfected Experimental Mediane 182:255-260, 1995, Nair et al., 

with nSic LiTencoding all or a pcition of (e.g., a 20 Journal of Virology 67:4062-4069, 1993; Huang et aL 

(^oplasim<>doinaintnincatedportion)ofanMHCclassIa Science 264:961-965, 1994; Macatoma et aL, Journal of 

chato protein and p 2 microglobulin protein or an MHC class Experimental Medicine 169:1255-1264 > W89jBhardwaj et 

H a chain protein and an MHC class H p chain protein to al., Journal of Clinical Investigation 94:797-807, 1994; and 

therebyexpressMHCdassIorMHCclassnFoteinsonthe Inaba et al., Journal of Experimental Medicine 

cell surface. Expression of the appropriate class I or class II 25 172:631-640, 1990. 

MHC in conjunction with a peptide having the activity of a Assays for lymphocyte survival/apoptosis (which will 

B lymphocyte antigen (e.g., B7-1, B7-2, B7-3) induces a T identify, among others, proteins that prevent apoptosis after 

cell mediated immune response against the transfected superantigen induction and proteins mat regulate lympho- 

tumor cell. Optionally, a gene encoding an antisensc con- cyte homeostasis) include, without limitation, those 

struct which blocks expression of an MHC class H associ- 30 described in: Darzynkiewicz et aL, Cytometry 1*795-808, 

atedproteir^suchasmemvariantdiain,canalsobecotrans- 1992; Gorczyca et aL ^^:65^70, l^Gorc- 

fected with a DNA encoding a peptide having the activity of zyca et al., Cancer Research 53:1945-1951. 1993,Eoh et al., 

a B lymphocyte antigen to promote presentation of tumor Cell 66:233-243, 1991; Zacharchuk, Journal oflmmunol- 

associated antigen and induce tumor specific immunity. ogy 145:4037^045, 1990; Zamai et al.. Cytometry 

Thus, the induction of aT cell mediated immune response in 35 14:891-897, 1993; Gorczyca et al., International Journal of 

a human subject may be sufficient to overcome tumor- Oncology 1:639-648, 1992. 

specific tolerance in the subject. Assays for proteins that influence early steps of T-ceE 
The activity of a protein of the invention may, among commitment and development include, without lua^on, 
other means, be measured by the following methods: those described in: Antica et al., Blood f*" 1 " 1 "!. 1 ^ 
Suitable assays for thymocyte or splenocyte cytotoxicity 40 Fine et al., Cellular Immunology 155:111-1^, ivsw, uaiy 
include, without limitation, those described in: Current et aL, Blood 85:2770-2778, 1995; Told et aL, Proa Nat 
Protocols in Immunology, Ed by J. E. Coligan, A. M. Acad Sd. USA 88:7548-7551, 1991. 
Kruisbeek, D. H. Margulies, E. M. Shevach, W. Strober, Hematopoiesis Regulating Activity 
Pub. Greene Publishing Assodates and Wiley-Intersdence A protein of the present invention may be useful in 
(Chapter 3, In Vitro assays for Mouse Lymphocyte Function 45 regulation of hematopoiesis and, consequently, in the treat- 
3 1-3 19; Chapter 7, Immunologic studies in Humans); ment of myeloid or lymphoid cell deficiencies. Even mar- 
Herrmann et al., Proc. NatL Acad. Sd. USA 78:2488-2492, ginal biological activity in support of colony forming cells 
1981; Herrmann et al., J. Immunol. 128:1968-1974, 1982; or of factor-dependent cell lines indicates invdvement in 
Handa et al.. J. Immunol. 135:1564-1572, 1985; Takai et al., regulating hematopoiesis, e.g. in supporting the growth and 
L Immunol. 1373494-3500, 1986; Takai etal., J. ImmunoL 50 proliferation of erythroid progenitor cells alone or in com- 
140:508-512 1988; Herrmann et al., Proc Natl. Acad. Sd. binatioh with other cytokines, thereby lndicatmg utility, for 
USA 78-2488-2492, 1981; Herrmann et aL, J. ImmunoL example, in treating various anemias or for use in conjunc- 
128-1968-1974, 1982; Handa et al., J. Immunol. tion with irradiation/chemotherapy to stimulate the produc- 
135-1564-1572', 1985; Takai et al., J. Immunol. tion of erythroid precursors and/or erythroid cells; in sup- 
137-3494-3500,' 1986; Bowmanet al., J. Virology 55 porting the growm and proliferation of myeloid cells such as 
61-1992-1998- Takai etal., J. Immunol. 140:508-512, 1988; granulocytes and monocytes/macrophages (i.e., traditional 
BertagnoUietal..CeUular Immunology 133327-341, 1991; CSF activity) useful, for example, in conjunction with 
Brown et aL, J. Immunol. 153:3079-3092, 1994. chemotherapy to prevent or treat consequent myelo- 
Assays for T-cell-dependent immunoglobulin responses suppression; in supporting the growth and proliferation or 
and isotype switching (which will identify, among others, fio megakaryocytes and consequently of platelets thereby 
proteins that modulate T-ceU dependent antibody responses allowing prevention or treatment of various platelet disor- 
andthataffectThl/rh2proffles)indude,wimoutlimitation, ders such as thrombocytopenia, and generally for use in 
those described in: Maliszewski, J. Immunol. place of orcornplimei»tarytopbtdettransfuaons;and/orin 
1443028-3033 1990; and Assays for B cell function: In supporting the growth and proliferation of hematopoietic 
vitro antibody production. Mond, J. J. and Brunswick, M. In 65 stem cells which are capable of maturing to any and all of 
Current Protocols in Immunology. J. E ca. Coligan eds. Vol the above-mentioned hematopoietic cells and therefore find 
1 pp 3 81-3 8.16, John Wiley and Sons. Toronto. 1994. therapeutic utility in various stem cell disorders (such as 
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thoseusuaUytreatedwithtransplantation.including, without (collagenase activity, osteoclast activity, etc.) mediated by 

limitation, aplastic anemia and paroxysmal nocturnal inflammatory processes. 

hemoglohin.il). as well as in repopulating the stem ceH ^^^7^3^ 

compartment post ™^ chs ™*ZZlr™^In 5 tenS^ent formation. A protein of the present 

OT «-vivo( l .e..mconjuncUon * invention? which induces tendoa%ament-lite tissue or 

tation or with peripheral progenitor cell transplantation ^ fonnation in drcumstMces where ^ tissue is 

(homologous or heterologous)) as normal cells or geneti- ^ nomjally fonned) has application in the healing of 

caliy manipulated for gene therapy. tendon or ligament tears, deformities and other tendon or 

The activity of a protein of the invention may, among Ugamct defects in humans and other animals. Such a 

other means, be measured by the following methods: io p iepaia ^ oa employing a tendon/ligament-like tissue induc- 

Suitable assays for proliferation and differentiation of ing protein may have prophylactic use in preventing damage 

various hematopoietic lines are cited above. ^ t0 tendon or ligament tissue, as well as use in die improved 

Assays for embryonic stem cell differentiation (which wfll fixation of tendon or ligament to bone or other tissues, and 

identify, among others, proteins that influence embryonic m repairing defects to tendon or ligament tissue. De novo 

differentiation hematopoiesis) include, without limitation, 15 tendon/ligament-like tissue formation induced by a compo- 

those described in: Johansson et al. Cellular Biology sition of the present invention contributes to the repair of 

15-141-151, 1995; Keller et al.. Molecular and Cellular congenital, trauma induced, or other tendon or ligament 

Biology 13-473-486, 1993; McClanahan et al., Blood defects of other origin, and is also useful in cosmetic plastic 

81-2903-2915 1993 surgery for attachment or repair of tendons or ligaments. The 

Assays for stem cell survival and differentiation (which 20 compositions of the present invention may provide environ- 
will identify, among others, proteins that regulate lympho- ment to attract tendon- or Ugarnent-forming cells, stimutote 
hematopoiesis) include, without limitation, those described growth of tendon- or Ugament-forrmng cells, induce differ- 
in: Methylcellulose colony forming assays, Freshney, M. G. entiation of progenitors of tendon- or kgament-forming 
In Culture of Hematopoietic Cells. R. L Freshney. et al. eds. cells, or induce growth of tendon/hgament cells or progeni- 
Vol pp. 265-2<58. Wiley-Liss. Inc.. New York. N.Y. 1994; 25 tors ex vivo for return in vivo to effect tissue repair. The 
Hirayama et al.. Proc. Natl. Acad. Sci. USA 89:5907-5911, compositions of the invention may also be useful in the 
1992- Primitive hematopoietic colony forming cells with treatment of tendinitis, carpaltunnel syndrome and other 
Wghproliferativepotentid,McNiece.IK.andBriddell,R. tendon or ligament defects. The compositions may also 
A Tn Culture of Hematopoietic Cells. R. I Freshney, et al. include an appropriate matrix and/or sequestering agent as a 
eds. Vol pp. 23-39, Wiley-Liss. Inc. New York. N.Y. 1994; 30 carrier as is well known m the art 
Neben et al.. Experimental Hematology 22:353-359, 1994; Theproteinof me presentinvenbonirayabo be useful for 
Cobblestone area forming cell assay, Ploemacher. R. E. In proliferation of neural cells and for regeneration of nerve 
Culture of Hematopoietic Cells. R. L Freshney, et aL eds. Vol and brain tissue, i.e. for the treatment of central and penph- 
pp l-21,WUey-Liss,Ihc.,NewYork,N.Y. 1994; Long term eral nervous system diseases and neuropathies, as well as 
bone marrow cultures in the presence of stromal cells, 35 mechanical and traumatic disorders which involve 
Spooncer, E., Dexter. M. and AUcn,T. In Culture ofHemato- degeneration, death or trauma to neural cells or nerve tissue. 
poetic Cells. R. L Freshney, et aL eds. Vol pp. 163-179, More specifically, a protein may be used in the treatment of 
Wiley-Liss, Inc., New York, N.Y. 1994; Long term culture diseases of the peripheral nervous system, such as peripheral 
initiating cell assay, Sutherland, H. J. In CuUure ofHemato- nerve injuries, peripheral neuropathy and localized 
poieHc Cells. R. 1 Freshney, et aL eds. Vol pp. 139-162, 40 neuropathies, and central nervous system diseases, such as 
Wiley-Liss, Inc., New York, N.Y. 1994. Alzheimer's, Parkinson's disease, Huntington s disease, 
Tissue Growth Activity amyotrophic lateral sclerosis, and Shy-Drager syndrome. 

A protein of the present invention also may have utility in Further conditions which may be treated m accordance with 

compositions used for bone, cartilage, tendon, ligament the present invention include mechanical and traumatic 

and/or nerve tissue growth or regeneration, as well as for 45 disorders, such as spinal cord disorders, head trauma and 

wound healing and tissue repair and replacement, and in the cerebrovascular diseases such as stroke. Peripheral neuro- 

treatment of burns, incisions and ulcers. pathies resulting from chemotherapy or other medical thera- 

A protein of the present invention, which induces card- pies may also be treatable using a protein of the invention, 

lage and/or bone growth in circumstances where bone is not Proteins of the invention may also be useful to promote 
normally formed, has application in the healing of bone 50 better or faster closure of non-healing wounds, including 

fractures and cartilage damage or defects in humans and without limitation pressure ulcers, ulcers associated with 

other animals. Such a preparation employing a protein of the vascular insufficiency, surgical and traumatic wounds, and 

invention may have prophylactic use in closed as well as the like. 

open fracture reduction and also in the improved fixation of It is expected mat a protein of the present invention may 
artificial joints. De novo bone formation induced by an 55 also exhibit activity for generation or regeneration of other 

osteogenic agent contributes to the repair of congenital, tissues, such as organs (including, for example pancreas, 

trauma induced, or oncologic resection induced craniofacial liver, intestine, kidney, skin, endothelium), muscle (smooth, 

defects, and also is useful in cosmetic plastic surgery. skeletal or cardiac) and vascular (including vascular 

A protein of this invention may also be used in the endothdium) tissue, or for promoting the growth of cells 
treatment of periodontal disease, and in other tooth repair 60 comprising such tissues. Part of the desired effects may be 

processes. Such agents may provide an environment to by inhibition or modulation of fihrotic scarring to allow 

attract bone-forming cells, stimulate growth of bone- normal tissue to regenerate. A protein of the invention may 

forming cells or induce differentiation of progenitors of also exhibit angiogenic activity, 

bone-forming cells. A protein of the invention may also be A protein of the present invention may also be useful for 
useful in the treatment of osteoporosis or osteoarthritis, such 65 gutprotection or regeneration and treatment of lung or liver 

as through stimulation of bone and/or cartilage repair or by fibrosis, reperfusion injury in various bssues, and conditions 

blocking inflammation or processes of tissue destruction resulting from systemic cytokine damage. 
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Aproteinof the present invention may also be useful for Pf^^^^^^l^^f^ 
promoting or inhibiting differentiation of tissues described of cells can be readily determined 
Love h 4 precursor Lues c cells; or for inhibiting the P^n orpeptide -^£25 

*^:SV?Z£LTl invention may, among 5 o^X^ b y^^^ 

otomeSe measured by the following methods:. .Assays for chemotactic activity (which 

Assays for tissue generation activity include, without teins that induce or prevent chemotaas) consist of assays 

limSo . Ascribed in: International Patent Public- that measure me abilfly of a 

tion No WO95/16035 (bone, cartilage, tendon); Interna- of cells across a membrane as weU as the ability of aprotein 

Sonal Patent ■ pSSmSTno. WO95/05846 (nerve, 10 to induce the adhesion of one cell population to an°thercell 

No. W091/ population. Suitable assays for mo = t , ^d adhesion 

mAaltMn Pn ^ ^ f ^thp1i^TTl^ include, without limitation, those described in. Current 

71382-^4 Uy/8J. ImmunoL 25:1744-1748; Gruber et al. J. of ImmunoL 

T^ofT^sent invention may also exhibit 20 152:5860-5867 1994; Johnston et al. J. of ImmunoL 

activin- or inhibin-related activities. Inhibins are character- 153:1762-1768, 1994. 

ized by their ability to inhibit the release of follicle stimu- Hemostatic and Thrombolytic Activity .... „ r 

uSg honnone S), while activins and are characterized A protein of the invention may alsoexhibrt hemostabc or 

h^^pffiwn Thus a orotein of the present invention, 25 to be useful in treatment of various coagulation disorders 

family, may be useful as a contraceptive basedon the ability enhance coagulation and other hemostatic events in gating 
3ffi3?i decrease fertility I female mammals and wounds resulting from trauma, «W «J«f«™*£ 
decrease spermatogenesis in irJemarnmals. Administration protein of the mvention may ako be 
ofTuMdentamouLofotheriiiMbinscaninduceMertmty ao inhibiting formation of toontosKudteJ^* 
in toese mammals Alternatively, the F<>tein of the prevention of conditions resulting therefrom (such as, for 
Z^n.T^^oresA^cL.r^^ exarnple, Marction of cardiac and central nervous system 
nrotein subunits of the inhibin-B group, may be useful as a vessels (e.g., stroke). 

S^Sg therapeutic, baled upon the ability of The activity of a ^"j^^ s mg 
activin molecules in stimulating FSH release from cells of 35 other means, be measured by the foUowmg 
fte alteS piSry. See. for example, U.S. Pat No. 4.798, Assay for hemosfctic and thrombolytic activity incite, 
885 A orotein of the invention may also be useful for without limitation, those described in: Lmet et aL, J. Chn. 
advantSof tie oLet7fatil£ in sexually immature Pharmacol. 26:131-140, 1986; Burdick et al., Thrombose 
mSSsoa, To increase the lifetime reproductive per- Res. 45:413-419 1987; Hu^hrey et : UJojn 
forrnanceofdomesticanimalssuchascows,sheepand P igs. <o 5:71-79 (1991); Schaub Prostaglandins 35:467-474, 1988. 

The activity of a protein of the invention may, among Receptor/Ligand Activity „ 5 „, 1tn , m „ nttete 
other means, be measured by the following methods: A protein of the present hwention may 

Assays for activinAnhibin activity include, without activity as receptors, receptor ligands "^draor ago- 
limitation. those described in: Vale et aL, Endocrinology nists of receptor/ligand interactions^ feamptes such 
91-562-572, 1972; Ling et al., Nature 321:779-782, 1986; 45 receptors and ligands include, without limitation, cytokme 
Vale et al Nature 321:776-779, 1986; Mason et aL, Nature receptors and their ligands, receptor kmases and their 
3?-6?9-663 mst Fomge et al., Proc. Natl. Acad. Sci. ligands, receptor phosphatases and flor ^ 
ttq a inQS involvedinceU-ceUinteractionsandtheir ligands (including 

USA wJWWTO.woo. ...-,„ without limitation, cellular adhesion molecules (such as 

°7SStSS^S£SS. may have diemotaetic 50 SSs.ttegrins' and their lig^s ) and remand 
^ rhemoldnetic activity fee act as a chemokine) for pairs involved in antigen presentation, antigen recognition 
Z£S£?X%S£t£ example, monocles, Ld development of cellular and humoral f irnmune 
fibroblasts, neutrophils, T-cells, mast cells, eosinophils, epi- responses). ^^.ft^Z^^£r^fZ 
thelial and/or endothelial cells. Chemotactic and chemoki- ing of potential peptide or ^ * ™ 

netic proteins can be used to mobilize or attract a desired cell 55 relevant receptor/ligand interaction. A proton of the present 

SuSToaSS^^^ ^ oa ^f^°^T a '^Zi^f 

kinetic proteins provide particular advantages in treatment tors and ligands) may themselves be useful as inhibitors of 
of wounds and other trauma to tissues, as well as in receptor/ligand interactions. 

LSof locatized infections. For example, attraction of The activity of a protond "^"""^r"* 
lvmohocvtes monocytes or neutrophils to tumors or sites of 60 other means, be measured by toe foUowi^ methods. 

Z h7m«rm. infftctfno rant, out limitation those described m:Current Protocols in 

ofSdel s chemotactic activity for a par- Immunology, Ed by J. E. Coligan, A. M.Kruisbeek, E UL 
ticufcTceU potation if it can sttoulateTdirectty or Margulies, E. M. Shevach, W . Strober, Pub. Greene Pub- 
indirectly.thetoectedorientationormovementofsuchcell 65 lishing Associates and ^'^T^tK^itifs 
Mnulation Preferably the protein or peptide has the ability Measurement of Cellular Adhesion under static caatoum 
KecS SSf dfrSed moveme« P of cells. Whether a 7.28.1-7.28.22), Takai et aL, Proc. Nafl. Acad. Sex. USA 
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qa.maa «««« 1087- m.,« .til I Exo Med. as an antigen in a vaccine composition to raise an immune 

!JS!S 1988; ' Rosenstein et al., J. Exp. Med. response gainst such protein or another material or entity 

169 149-160 1989; Stoltenborg et al., J. Immunol. Methods which is cross-reactive with such protein. 

175:59-68, 1994; Stitt et al, Cell 80:661-670, 1995. ADMINISTRATION AND DOSING 

Anti-Inflammatory Activity 3 

Proteins of the present invention may also exhibit anti- a protein of the present invention (from whatever source 
inflammatory activity. The anti-inflammatory activity may derived, including without limitation from recombinant and 
be achieved by providing a stimulus to cells involved in the n0 n-recombinant sources) may be used in a pharmaceutical 
Mammatory response, by inhibiting or promoting cell-cell composition when combined with a pharmaceutically 
interactions (such as, for example, cell adhesion), by inhib- 10 accepta y e carrier. Such a composition may also contain (in 
iting or promoting chemotaxis of cells involved in the addition to protein and a carrier) diluents, fillers, salts, 
inflammatory process, inhibiting or promoting cell DU ffers, stabilizers, solubilizers, and other materials well 
extravasation, or by stimulating or suppressing production ^nsm in ^ ^ tenn pharmaceutically acceptable" 
of other factors which more directly inhibit or promote an means a n0 n-toxic material that does not interfere with the 
inflammatory response. Proteins exhibiting such activities 15 e ff ec tiveness of the biological activity of the active 
can be used to treat inflammatory conditions including ingredients). The characteristics of the carrier will depend 
chronic or acute conditions), including without limitation on ^ route of administration. The pharmaceutical compo- 
intimation associated with infection (such as septic shock, sition of ^ invention may also contain cytokines, 
sepsis or systemic inflammatory response syndrome (SIRS) lymphokines, or other hematopoietic factors such as M-CSF, 
), ischemia-reperfusion injury, endotoxin lethality, arthritis, 20 GM-CSF,TNF. IL- 1, IL-2, IL-3, IL-4 f IL-5 , IL-6, IL-7, IL-8, 
complement-mediated hyperacute rejection, nephritis, iL-9 t n^io, IH1, IL-12, 11^-13, IL^14,n^l5,IFN,TNF0, 
cytokine or chemokine-induced lung injury, inflammatory tnfI, TNF2, G-CSF, Meg-CSF, trffombopoietin, stem cell 
bowel disease, Crohn's disease or resulting from over pro- factor, and erythropoietin. The pharmaceutical composition 
duction of cytokines such as TNF or IL-1. Proteins of the ^ further contain other agents which either enhance the 
invention may also be useful to treat anaphylaxis and 25 ^y^y 0 f me protein or compliment its activity or use in 
hypersensitivity to an antigenic substance or material. treatment. Such additional factors and/or agents may be 
Tumor Inhibition Activity included in the pharmaceutical composition to produce a 
Id addition to the activities described above for immuno- synergistic effect with protein of the invention, or to mini- 
logical treatment or prevention of tumors, a protein of the si(Je effects. Conversely, protein of the present inven- 
invention may exhibit other anti-tumor activities. A protein 30 ^ ^ ^ included in formulations of the particular 
may inhibit tumor growth directly or indirectly (such as, for cytokine, lymphokine, other hematopoietic factor, throm- 
example, via ADCC). A protein may exhibit its tumor b 0 iyfc oran ti-trtfomboti 

inhibitory activity by acting on tumor tissue or tumor t0 m i n j m i 2e s id e effects of the cytokine, lympboMne, other 

precursor tissue, by inhibiting formation of tissues necessary hematopoietic factor, thrombolytic or anti-mrombotic factor, 

to support tumor growth (such as, for example, by inhibiting 35 or ^.inflammatory agent 

angiogenesis), by causing production of other factors, agents A otein of me pre sent invention may be active in 

or cell types which inhibit tumor growth, or by suppressing, multimers ( e<g ., heterodimers or homodimers) or complexes 

eliminating or inhibiting factors, agents or cell types which ^ ^ other proteins. As a result, pharmaceutical 

promote tumor growth. compositions of the invention may comprise a protein of the 

Other Activities 40 invention in such multimeric or complexed form. 

Aprotein of the inyentioninay H, c pharmaceutical composition of the invention may be 

ofmefoUowingaddmonalactivitiesOTeffects:iiinibltingthe rn JT nf the Drotrinfs) of oresent 

including, without limitation, bacteria, viruses • ^T^antigen will deU^er a stimulatory signal to 

other parasites; effecting (suppressing or _ enhancing) booity 45 ™ ^ jy^hocytes. B lymphocytes will respond to 

^^^^^ n ^^ b ^lf e ^gfntoughTeSace iSSbulin receptor. T 

hair color, eye color, skin, fat to lean ratio or other tissue f™*. » . ^ through the T cell 

pigmentation, or organ or body part size or shape (such as, J^^^XSng pLSon of the antigen by 

for example, breast augmentation or diminution, change in "^ to ^R> rtroSmW related proteins 

bone i™«^>?^} f ^^«^ * S^Sen^^^ 

cycles or rhythms; effecting the fertility of male or female s to -went the peptide antigen(s) to 

subjects; effecting the metabolism, catabohsn, anabohs^ "^^^Zgn KmpoSito couldVo be 

processing, utilization storage or elminahon of die^ry fat, J^gJ^J^ MHC^eptide Complexes alone or with 

Upid.protein.ca^r^dmte.vxtanunsmm ^ cc SuS<Smolecules that can directly signal T cells, 

other nutritional factors or components); effecting behav- 55 ^Ttemativelv antibodies able to bind surface immunoglobu- 

ioral characteristics, including without HMQ SSSSSSSSt «u7as well as antibodiSable 

libido, stress, cognition (mduding | ^ "J2 to ffd TtCR and other molecules on T cells can be 

tl^^JSSSt^X^ 

^Tt^^^^&rSS-, 60 ^pharn^ceuticalc^^^^ 
n ™alc?e7dS activity; in the caVe of enzymes, in the form of a Uposome in .which protein of the present 
SrecSg dffiSes of The enzyme and treating invention iscombined, in addMon to omerpharraac^tic^ 
&fidencj?related diseases; treatment of hyperproliferative acceptable carriers, with amphipattoc agents suA as hprfs 
disorders (such as for example, psoriasis); « which exist in aggregated form as micelles, insoluble 
So^buSfee activg (such as P for example, the monolayers, liquid I oystols ox 
ability tobmdanu^ns«complement);andtheabuity toad solution. Suitable hpids for liposomal formulation include, 
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without limitation, monoglyccrides, diglycerides, sulfatides, or subcutaneous injection, protein of the present invenUon 

lysolecithin. phospholipids, saponin, bile acids, and the like. will be in the form of apyrogen-free,parenterally acceptable 

Reparation of such l^osomal formulations is within the aqueous solution. The preparation of such P^nteraUy 

levdofskminmeart^ acce P^ le ^r?l^^ft^n!teS 

zo^a7i.A<ni7?R-4fiT7n?8-and4737.323 allof 5 isotonicity, stability, and the like, is within the skill in the art. 

No 4235,871, 4 JO L728, 4,837, 02Mnd4,737^J, allot 5 ^ composition for intravenous, 

which are incorporated herein by reference. cutaneous, or subcutaneous injection should contain, in 

As used herein, the terra "therapeutically effective addition to protein of the present invention, an isotonic 

amount" means the total amount of each active component ven icle such as Sodium Chloride Injection, Ringer's 

of the pharmaceutical composition or method that is suffi- injection. Dextrose Injection, Dextrose and Sodium Chlo- 

cient to show a meaningful patient benefit, Le., treatment, 10 ^de Injection, Lactated Ringer's Injection, or other vehicle 

healing, prevention or amelioration of the relevant medical as known in the art. The pharmaceutical composition of the 

condition, or an increase in rate of treatment, healing, present invention may also contain stabilizers, preservatives, 

prevention or amelioration of such conditions. When applied buffers, antioxidants, or other additives known to those of 

to an individual active ingredient, administered alone, the skill in the art 

term refers to that ingredient alone. When applied to a 15 The amount of protein of the present invention in the 

combination, the term refers to combined amounts of the pharmaceutical composition of the present invention will 

active ingredients that result in the therapeutic effect, depend upon me nature and severity of the condition being 

whether administered in combination, serially or simulta- treated, and on the nature of prior treatments which the 

neously patient has undergone. Ultimately, the attending physician 

, . „ * 20 will decide the amount of protein of the present invention 

Inr^cticingthemethodo^ J™ «£kh to treat each individual patient. Initially, the 

invention, a therapeutically effective amount of protein ot attendin ph ysician will administer low doses of protein of 

the present invention is administered to a mammal having a ^ cnt invention and observe the patient's response, 

condition to be treated. Protein of the present invention may i^ gtl doses of protein of the present invention may be 

be administered in accordance with the method of the administered until the optimal therapeutic effect is obtained 

invention either alone or in combination with other therapies ^ Qr me pati en t; ^ # that point the dosage is not increased 

such as treatments employing cytokines, lymphokines or further. It is contemplated that the various pharmaceutical 

other hematopoietic factors. When co-administered with one compositions used to practice the method of the present 

or more cytokines, lymphokines or other hematopoietic invention should contain about 0.01 ug to about 100 mg 

factors, protein of the present invention may be adniinistered ^ (preferably about 0.1 ug to about 10 mg, more preferably 

either simultaneously with the cytokine(s), lymphokine(s), a t, out q.1 ug to about 1 mg) of protein of the present 

other hematopoietic factor(s), thrombolytic or anti- invention per kg body weight 

thrombotic factors, or sequentially. If administered ^ duration of intravenous therapy using the pharma- 

sequentialiy, the attending physician will decide on the optical composition of the present invention will vary, 

appropriate sequence of administering protein of the present ^ depending on the severity of the disease being treated and 

invention in combination with cytokine(s), lymphokine(s), me condition and potential idiosyncratic response of each 

other hematopoietic factor(s), thrombolytic or anti- individual patient It is contemplated that the duration of 

thrombotic factors. eacn application of the protein of the present invention will 

Administration of protein of the present invention used in be in the range of 12 to 24 hours of continuous intravenous 

the pharmaceutical composition or to practice the method of 40 administration. Ultimately the attending physician will 

the present invention can be carried out in a variety of decide on the appropriate duration of intravenous therapy 

conventional ways, such as oral ingestion, inhalation, topical using the pharmaceutical composition of the present inven- 

application or cutaneous, subcutaneous, intraperitoneal, tion. 

parenteral or intravenous injection. Intravenous adininistra- Protein of the invention may also be used to immunize 

tion to the patient is preferred. 45 animals to obtain polyclonal and monoclonal antibodies 

When a therapeutically effective amount of protein of the which specifically react with the protein. Such antibodies 

present invention is administered orally, protein of the may be obtained using either the entire protein or fragments 

present invention will be in the form of a tablet capsule, thereof as an immunogen. The peptide immunogens addi- 

powder, solution or elixir. When adrninisteredin tablet form, tionally may contain a cysteine residue at the carboxyl 

the pharmaceutical composition of the invention may addi- 50 terminus, and are conjugated to a hapten such as keyhole 

tionally contain a solid carrier such as a gelatin or an limpet hemocyanin (KLH). Methods for synthesizing such 

adjuvant The tablet, capsule, and powder contain from peptides are known in the art, for example, as in R. P. 

about 5 to 95% protein of the present invention, and pref- Merrifield, J.Amer. Chem, Soc. 85, 2149-2154 (1963); J. L. 

erably from about 25 to 90% protein of the present inven- Krstenansky, et al., FEBS Lett 211, 10 (1987). Monoclonal 

tion. When adniinistered in liquid form, a liquid carrier such 55 antibodies binding to the protein of the invention may be 

as water, petroleum, oils of animal or plant origin such as useful diagnostic agents for the immunodetection of the 

peanut oil, mineral oil, soybean oil, or sesame oil, or protein. Neutralizing monoclonal antibodies binding to the 

synthetic oils may be added. The liquid form of the phar- protein may also be useful therapeutics for both conditions 

maceutical composition may further contain physiological associated with the protein and also in the treatment of some 

saline solution, dextrose or other saccharide solution, or forms of cancer where abnormal expression of the protein is 

glycols such as ethylene glycol, propylene glycol or poly- involved. In the case of cancerous cells or leukemic cells, 

ethylene glycol When administered in liquid form, the neutralizing monoclonal antibodies against the protein may 

pharmaceutical composition contains from about 0 J to 90% be useful in detecting and preventing the metastatic spread 

by weight of protein of the present invention, and preferably of the cancerous cells, which may be mediated by the 

from about 1 to 50% protein of the present invention. $5 protein. 

When a therapeutically effective amount of protein of the For compositions of the present invention which are 

present invention is administered by intravenous, cutaneous useful for bone, cartilage, tendon or ligament regeneration, 
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the therapeutic method includes administering the compo- f erred sequestering agents *<f^^ 

sition topically, systematically, or locally as an implant or alginate, poly(emylene gly^ 

device. When adininistered, the therapeutic composition for boxyviny polymer and poly(viny alcohol ^ The a^untof 

use in this invention is, of course, in a pyrogen-free, physi- sequestering agent useful herein is OS-20 wt : %, 

oloS acceptable form. Furtner, the composition may 5 1-10 wt % based on total formulation ^^J^ 

desh-ably be encapsulated or injected in a viscous form for resents the amount necessary to F"™^^ 

dehvery to the site of bone,, cartilage or tissue damage. protein from the polymer matrix and to ;F^eappr^nate 

Topic^adrninistration may be suitable for wound healing handling of the composition yet ^ » 

and tissue repair. Therapeutically useful agents other than a progenitor ceUs are prevented from 

wotein of the invention which may also optionally be 10 thereby providing the protein the opportunity to assist the 

included in the composition as described above, may alter- osteogenic activity of the progenitor cells. ^ 
natively or additionally, be administered simultaneously or further compositions, proteins of the invention may be 

sequentially with the composition in the methods of the combined with other agents beneficial to the treatment of the 

invention. Preferably for bone and/or cartilage formation, bone and/or cartilage defect, wound, or tissue in question, 

the composition would include a matrix capable of deliver- 15 These age nts include various growth factors such as epider- 

ing the protrin-containing composition to the site of bone mal growth factor (EGF), platelet derived growth factor 

and/or cartilage damage, providing a structure for the devel- (PDGF), transforming growth factors (TGF-a and TGF-0), 

oping bone and cartilage and optimally capable of being and insulin-like growth factor (IGF), 
resorbed into the body. Such matrices may be formed of ^ compositions are also presently valuable 

materials presently in use for other implanted medical 20 f<ff vetermary applications. Particularly domestic ammals 

applications. and thoroughbred horses, in addition to humans, are desired 

The choice of matrix material is based on patients for such treatment with proteins of the present 

biocompatibility, biodegradability, mechanical properties, invention. 

cosmetic appearance and interface properties. The particular ^ dosage regimen 0 f a protem-containing pharmaceu- 
application of the compositions will define the appropriate 25 ^ composition t0 be used in tissue regeneration will be 
formulation. Potential matrices for the compositions may be determined by the attending physician considering various 
biodegradable and chemically defined calcium sulfate, f actorsW hich modify the action of the proteins, e.g., amount 
tricaldiimphosphate, hydroxyapatite, polylactic acid, poly g- of we igh t desired to be formed, the site of damage, the 
lycolic acid and polyanhydrides. Other potential materials condition 0 f the damaged tissue, the size of a wound, type 
are biodegradable and biologically well-defined, such as of damage<1 tissue (e.g., bone), the patients age, sex, and 
bone or dermal collagen. Further matrices are comprised of ^ ^ severitv 0 f infection, time of administration and 
pure proteins or extracellular matrix components. Other clinical f actors . The dosage may vary with the type of 
potential matrices are nonbiodegradable and chemically matrix used in the reconstitution and with inclusion of other 
defined, such as sintered hydroxyapatite, bioglass, proems in mc pharmaceutical composition. For example, 
aluminates, or other ceramics. Matrices may be comprised ^ addition 0 f omer known growth factors, such as IGF I 
of combinations of any of the above mentioned types of ^ growth factor I), to the final composition, may 
material, such as polylactic acid and hydroxyapatite or ^ me ^sage. Progress can be monitored by peri- 
collagen and tricalciumphosphate. The bioceramics may be ^ asscssment 0 f tissue/bone growth and/or repair, for 
altered in composition, such as in caldum-aluminate- exam pi e , X-rays, histomorphometric determinations and tet- 
phosphate and processing to alter pore size, particle size, j^y^e labeling. 

particle shape, and biodegradability. Polynucleotides of the present invention can also be used 

Presently preferred is a 50:50 (mole weight) copolymer of for mcra py. Such polynucleotides can be introduced 

lactic acid and glycolic acid in the form of porous particles either m ^ or ^ viv0 mt0 cc rjs f or expression in a 
having diameters ranging from 150 to 800 microns. In some ^ |TigmTTiall - an sub ject Polynucleotides of the invention may 

applications, it will be useful to utilize a sequestering agent, ^ fee admmisterc<1 by omer known methods for introduc- 

such as carboxymethyl cellulose or autologous blood clot, to ^ q{ nudeic ^ feto a cell or organism (including, 
prevent the protein compositions from disassociating from limitation, in the form of viral vectors or naked 

the matrix. DNA). 

A preferred family of sequestering agents is ceUulosic M Q ^ ^ ^ calmtd ^ ^ k me presence of 
materials such as alkylcelluloses {including ^ of me resent mvention m cmler to proliferate ot to 

hydroxyalkylcelluloses), including methylcellulose, & ^ effect Qn w m ^ cells . Treated 

ethylcellulose, hydroxyethylcellulose, £ elkcanmenbem trc<u^ 

ionic salts of carboxymethylcdlulose (CMC). Other pre- rated by reference as if fully set forth. 



SEQUENCE LISTING 



( 1 ) GENERAL INFORMATION: 

(lii ) NUMBER OF SEQUENCES: 11 

( 2 ) INFORMATION FOR SEQ ED NO:I : 
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( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 432 base pairs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS; double 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO: I: 

OOTTTOAAAA CTCTOCTTCC TTTOTGAATT TGCTOTTAGO AGTTCTTATT OTTATTCTGC 60 

AGCCTTT ACT ATTGTCCTTT ATTTACTGAA CACAGTGAAT ACCAAGCACT GTTTATTAOA 120 

GGTTAOGAGT AGGGGCAGOT OATTAAAAAA ACAAAAAAGC TAATAATCTC CTCAAOCAAT 180 

TTCTGOCCTA ATAGAATTAT AOTAG ACAGT GAAG T AT CT A A ACCC AGGGA ATCAGATTGA 240 

OGCACCATGT CCATCOCCTT GAOAATTAAT AGGCTOCATT TCTOGGTTCT CCNTTTTTTT 300 

TTTTTTTTTG CCCAACTGAG TCTTTCTGTG GACT T AC AT G GAACTTCTTA TTCTCTTAAA 360 

TCATTAAGTT ACT T G AC AA T ATTCTTOGAT TTGGAGAAAC TOOATGTAGG GCCGTATOAA 420 

AAAATCATTC OA 432 

( 2 ) INFORMATION FOR SEQ ID N02: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 62 amino acids 
( B ) TYPE: amino acid 
( C ) STRANDEDNESS: 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: proton 

( x i ) SEQUENCE DESCRIPTION: SEQ H> 

Met Sei lie Ala Leu Arg lie Am At g Leu His Phe Trp Val Leu Xa a 
15 10 15 

Phe Phe Pbe Phe Phe Ala Gin Leu Ser Leu Ser Val Asp Leu His Gly 
2 0 2 5 3 0 

Thr Ser Ty i Ser Leu Ly i Ser Leu Ser Tyr Leu Thr lie Phe Leu Asp 
3 5 4 0 4 5 

Leu OI D Lys Leu Asp Val Gly Pro Tyr Glu Lys lie lie Arg 
5 0 5 5 6 0 

( 2 ) INFORMATION FOR SEQ ID N05: 

( i ) SEQUENCE CHARACTERISTICS: 
(A ) LENGTH: 219 base pairs 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: double 
( D ) TOPOLOGY: linear 

( t i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N03: 

ATAOGATACN GTATCTNOCT TTTTTCATTT AAACOTCONG AGCAATTTTC CC AAOACAT A 60 

ACA AACTOTC TTNOAAAAAN GGAAAACATT NGGGGCTGTC AGCANAACNG AAAA TGTTTT 120 

CTGGGTGAGA CACATGTATC TTNONAATOG GTTOGATTIA GTGTGCTTTA TTTCAATAAA ISO 

A AT T CAGT AT T AT A AT T T A A AAAAAAAAAA AAA A A A A A A 2 19 

( 2 ) INFORMATION FOR SEQ ID NO:4: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 501 base pairs 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: double 
( D )TOPOLOOY:Ibear 
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( i i ) MOLECULE TYPE: cDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:4: 

TCCACAGGTO TCCANTCCCA GOTCCAACTO C AO ATTT CO A AT T CGOCC TT C ATOOCC T AG 60 

AOCGACOCGO AOAARAOCTC CGOOTGCCOC OGCACTOCAO COCTOAOATT CCTTTACAAA 120 

GAAACTCAGA OOACCOOOAA OA A AO A AT T T CACCTTTOCO ACOTOCTAOA A AAT AARGTC 180 

GTCTGGOAAA AGO ACTGG AG ACACAAGCGC ATCSCAASYY SRGTOAAOOA S A A AS NGA KG 240 

GANBTAKWWM MOWOSWOAAA AATKTYWWKC A AMMWMGO T A TTTTCCCT TO GAT ATT AACT 3 00 

TGCATATCTO AAOAAATGGC ATTCCGOACA ATTTOCGTGT TGGTTGGAOT ATTTATTTOT 360 

TCTATCTOTO TOAAAOGATC TTCCCAGCCC CAAOCAAOAO TT T ATT T A AC ATTTGATGAA 420 

C T T CO AO AAA CCAAGACCTC TGAAT ACTTC AGCCTTTCCC ACCATCCTTT AO AC T AC AGO 480 

ATTTTATTAA TOGATGAAOA T 501 

( 2 ) INFORMATION FOR SEQ ID NCfc5: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 62 amino adds 
( B ) TYPE: amino add 
( C ) STRANDEDNESS: 
( D ) TOPOLOGY: finear 

( i i ) MOLECULE TYPE: protein 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N05: 

Met Ala Phe Aig Thr lie Cy. Val Lou Val Gly Val Pbe lie Cyi Ser 
1 5 10 15 

lie Cy» Val Lyi Gly Ser Ser Gin Pro Gin Ala Aig Val Tyr Leu Thr 
20 25 30 

Phe Asp GIu Leo Arg Glu Thr Lys Thr Ser Glu Tyr Phe Ser Lou Ser 
35 40 45 

His Hit Pro Leu Asp Tyr Arg lie Leu Leu Met Asp Glu Asp 
3 0 5 5 60 

( 2 ) INFORMATION FOR SEQ ID NO& 

< i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 302 base pairs 
( B ) TYPE: podge add 
< C ) STRANDEDNESS: double 
(D ) TOPOLOGY: Enear 

< i i ) MOLECULE TYPE: eDNA 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NOrf: 

CTAOCACTAO ACATOTCATO GTCTTCATGG TGCATATAAA TATATTTAAC TTAACCCAOA 60 

TT TTATTTAT ATCTTTATTC ACCTTTTCTT CAAAATCOAT ATOOTOOCTO C A A AAC T AO A 120 

ATTGTTGC AT CCCTCAATNG AATGAOGGCC AT ATCCCTGT GOT AT TCCTT TCCTOCTTNO 180 

OOGCTTTAGA ATT CT A AT T O TCAGTGATTT TGTATATOAA AACAAOTTCC AAATCCACAG 240 

CTTTTACGTA OTAAAAOTCA TAAATGCATA TGACAGAATO OCT ATC AAAA GA AAA A A AAA 300 
AA 

( 2 ) INFORMATION FOR SEQ ID NCK7: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 448 base paks 
(B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: doable 



3 0 2 
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( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: cDNA 

(x i ) SEQUENCE DESCRIPTION: SEQ ID NO:7: 
OGCOAARGCA OCOOCAOOTC oooaocaara TOOCOCTGCO GCCAGGAGCT GGTTCTOGTG 60 
GCGGCGGGGC COCGARGAK Y ATR- 

RYOYORK KT Y YRY Y S KG KKWKSMGOST TCATGTTTCC 120 

TOTTGCAGOT GGGATAAGAC CCCCTCAAOG CCTGATGCCO ATGCAGCAAC AAGQATTTCC 180 
TATGGTCTCT GTCATOCAOC CTAATATGCA AOOCATTATG OOAATGAATT ACAGCTCTCA 240 
OATOTCCCAA GOACCTATTG CTATGCAGOC AGGAATACCA AT GGG AC C A A TGCCAGCAGC 300 
GOOAATGCCT TACCTAGGAC AAGCACCCTT CCTGGGCATG CGTCCTCCAG GCCCACAGTA 360 
CACTCCAOAC ATGCAGAAOC AGTTTGCCGA AGAGCAGCAG AAACO ATTTO AACAGCAGCA 420 
AAAACTCTTA GAAAAAAAAA AAAAAAAA 448 

( 2 } INFORMATION FOR SEQ ID NChS: 

( t ) SEQUENCE CHARACTERISTICS: 

( A ) LENGTH: 107 amino adds 
( B ) TYPE: amino add 
< C ) STRANDEDNESS: 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: proton 

( X i ) SEQUENCE DESCRIPTION: SEQ ED NO:& 

Met PbeProVal Ala Gly G 1 y I 1 e A r g Pro ProGln Gly .Leu Met Pro 
1 5 10 15 

Met CI o G I n Gin Gly Phe Pro Met Val St: Va I Met Ola Pro Am Met 
2 0 2 5 3 0 

Gin Gly lie Met Gly Met Aan Tyr Ssr Ser Olo Met Ser Gin Gly Pro 
3 5 4 0 4 5 

lie Ala Met Gin Ala Gly lie Pro Met Gly Pro Met Pro Ala Ala Gly 
5 0 5 5 6 0 

Met Pro Tyr Leu Gly Gin Ala Pro Phe Leu Gly Met At g Pro Pro Gly 
65 70 75 80 

Pro Oln Tyr Tbr Pro Asp Met Gin Ly i Gin Phe Ala Glu Gin Gin Gin 
8 5 9 0 9 5 

Ly i Arg Phe Glu Gin Gla Gin Lyi Leu Leu Glu 



( 2 ) INFORMATION FOR SEQ ID NOS: 

( i ) SEQUENCE CHARACTERISTICS: 
. . ( A ) LENGTH: 29 hase pain 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: finear 

( t i ) MOLECULE TYPE: other nucleic add 

( A ) DESCRIPTION: /dc*c = "olignmif-Icotidc" 

(si) SEQUENCE DESCRIPTION: SEQ ID NOA 

GNOCCTCAAT CTGATTCCCT GGGTTTAOA 



( 2 ) INFORMATION FOR SEQ ID NO: 10: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 29 base pain 
( B ) TYPE: nucldc add 
( C ) STRANDEDNESS: *inglc 
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( D ) TOPOLOOY: finear 

( i i ) MOLECULE TYPE: other nucleic acid 

( A ) DESCRIPTION: /desc = "oHgoniJcleoddc" 

(xi ) SEQUENCE DESCRIPTION: SEQ H> NO:10: 

ONCCOOAATO CCATTTCTTC AOATATGCA 



( 2 ) INFORMATION FOR SEQ ID NO:lI: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 29 base pnrs 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: single 
( D ) TOPOLOGY: linear 

< 1 i ) MOLECULE TYPE; other micleic acid 

( A > DESCRIPTION: /desc =» "oEgonuckotide" 

( x i ) SEQUENCE DESCRIPTION: SEQ ID NO:ll: 

TNCCATTOOT ATTCCTOCCT GCATAGCAA 



What is claimed is: 

1. An isolated polynucleotide selected from the group 
consisting of: 

(a) a polynucleotide comprising the nucleotide sequence 
ofSEQIDNO:l; 

(b) a polynucleotide comprising the nucleotide sequence 30 
of SEQ ID NO:l from nucleotide 247 to nucleotide 
432; 

(c) a polynucleotide comprising the nucleotide sequence 
of SEQ ID NO:l from nucleotide 328 to nucleotide 
432; 35 

(d) a polynucleotide comprising the nucleotide sequence 
of the full length protein coding sequence of clone 
BD372_5 deposited under accession number ATCC 
98146; 

(e) a polynucleotide encoding the full length protein 40 
encoded by the cDNAinsert of clone BD372.J depos- 
ited under accession number ATCC 98146; 

(f) a polynucleotide comprising the nucleotide sequence 

of the mature protein coding sequence of clone 45 
BD372_5 deposited under accession number ATCC 
98146; 

(g) a polynucleotide encoding the mature protein encoded 
by the cDNAinsert of cloneBD372_5 deposited under 
accession number ATCC 98146; and 50 

(h) a polynucleotide encoding a protein comprising the 
amino acid sequence of SEQ ID NO:2. 

2. The polynucleotide of claim 1 comprising the nucle- 
otide sequence of SEQ ID NO:l. 

3. The polynucleotide of claim 1 comprising the nude- 55 
otide sequence of SEQ ID NO:l from nucleotide 247 to 
nucleotide 432. 



4. The polynucleotide of claim 1 comprising the nucle- 
otide sequence of SEQ ID NO:l from nucleotide 328 to 
nucleotide 432. 

5. The polynucleotide of claim 1 comprising the nucle- 
otide sequence of the full length protein coding sequence of 
clone BD372„5 deposited under accession number ATCC 
98146. 

6. The polynucleotide of claim 1 encoding the full length 
protein encoded by the cDNA insert of clone BD372_5 
deposited under accession number ATCC 98146. 

7. The polynucleotide of claim 1 comprising the nucle- 
otide sequence of the mature protein coding sequence of 
clone BD372_J deposited under accession number ATCC 
98146. 

8. The polynucleotide of claim 1 encoding the mature 
protein encoded, by the cDNA insert of clone BD372_5 
deposited under accession number ATCC 98146. 

9. The polynucleotide of claim 1 encoding a protein 
comprising the amino acid sequence of SEQ ID NO:2. 

10. A vector comprising a polynucleotide of claim 1 
wherein said polynucleotide is operably linked to an expres- 
sion control sequence. 

LL Ahost cell transformed with a vector of claim 2. 

12. The host cell of claim 3, wherein said cell is a 
mammalian cell. 

13. A process for producing a protein, which comprises: 

(a) growing a culture of the host cell of claim 3 in a 
suitable culture medium; and 

(b) purifying the protein from the culture. 

14. An isolated gene corresponding to the cDNA sequence 
of SEQ ID NO: 1. 

* * * * * 
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ABSTRACT 



The present invention relates to purified DNA sequences 
encoding all or a portion of an osteoclast-specific or -related 
gene products and a method for identifying such sequences. 
The invention also relates to antibodies directed against an 
osteoclast-specific or -related gene product. Also claimed 
are DNA constructs capable of replicating DNA encoding all 
or a portion of an osteoclast-specific or -related gene prod- 
uct, and DNA constructs capable of directing expression in 
a host cell of an osteoclast-specific or -related gene product. 
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1 AGACACCTCT GCCCTCACCA TGAGCCTCTG GCAGCCCCTG GTCCTGGTGC TCCTGGTGCT 

61 GGGCTGCTGC TTTGCTGCCC CCAGACAGCG CCAGTCCACC CTTGTGCTCT TCCCTGGAGA 

121 CCTGAGAACC AATCTCACCG ACAGGCAGCT GGCAGAGGAA TACCTGTACC GCTATGGTTA 

181 CACTCGGGTG GCAGAGATGC GTGGAGAGTC GAAATCTCTG GGGCCTGCGC TGCTGCTTCT 

241 CCAGAAGCAA CTGTCCCTGC CCGAGACCGG TGAGCTGGAT AGCGCCACGC TGAAGGCCAT 

301 GCGAACCCCA CGGTGOGGGG TCCCAGACCT GGGCAGATTC CAAACCTTTG AGGGCGACCT 

361 CAAGTGGCAC CACCACAACA TCACCTATTG GATCCAAAAC TACTCGGAAG ACTTGCCGCG 

421 GGCGGTGATT GACGACGCCT TTGCCCGCGC CTTCGCACTG TGGAGCGCGG TGACGCCGCT 

481 CACCTTCACT CGCGTGTACA GCCGGGACGC AGACATCGTC ATCCAGTTTG GTGTCGCGGA 

541 GCACGGAGAC GGGTATCCCT TCGACGGGAA GGACGGGCTC CTGGCACACG CCTTTCCTCC 

601 TGGCCCCGGC ATTCAGGGAG ACGCCCATTT CGACGATGAC GAGTTGTGGT CCCTGGGCAA 

661 GGGCG TCGTG GTTCCAACTC GGTTTGGAAA CGCAGATGGC GCGGCCTGCC ACTTCCCCTT 

721 CATCTTCGAG GGCCGCTCCT ACTCTGCCTG CACCACCGAC GGTCGCTCCG ACGGGTTGCC 

781 CTGGTGCAGT ACCACGGCCA ACTACGACAC CGACGACCGG TTTGGCTTCT GCCCCAGCGA 

841 GAGACTCTAC ACCCGGGACG GCAATGCTGA TGGGAAACCC TGCCAGTTTC CATTCATCTT 

901 CCAAGGCCAA TCCTACTCCG CCTGCACCAC GQACGGTCGC TCCGACGGCT _ ACCGCTGGTG 

961 CGCCACCA CC GCCAACTACG ACCGGGACAA GCTCTTCGGC TTCTGCCCGA CCCGAGCTGA 

1021 CTCGACGGTG ATGGGGGGCA ACTCGGCGGG GGAGCTGTGC GTCTTCCCCT TCACTTTCCT 

1081 GGGTAAGGAG TACTCGACCT GTACCAGCGA GGGCCGCGGA GATGGGCGCC TCTGGTGCGC 

1141 TACCACCTCG AACTTTGACA GCGACAA GAA GTGGGGCTTC TGCCCGGACC AAGGATACAG 

1201 TTTGTTCCTC GTGGCGGCGC ATGAGTTCGG CCACGCGCTG GGCTTAGATC ATTCCTCAGT 

1261 GCCGGAGGCG CTCATGTACC CTATGTACCG CTTCACTGAG GGGCCCCCCT TGCATAAGGA 

1321 CGACGTGAAT GGCATCCGGC ACCTCTATGG TCCTCGCCCT GAACCTGAGC CACGGCCTCC 

1381 AACCACCACC ACACCGCAGC CCACGGCTCC CCCGACGGTC TGCCCCACCG GACCCCCCAC 

1441 TGTCCACCCC TCAGAGCGCC CCACAGCTGG CCCCACAGGT CCCCCCTCAG CTGGCCCCAC 

1501 AGGTCCCCCC ACTGCTGGCC CTTCTACGGC CACTACTGTG CCTTTGAGTC CGGTGGACGA 

1561 TGCCTGCAAC GTGAACATCT TCGACGCCAT CGCGGAGATT GGGAACCAGC TGTATTTGTT 

1621 CAAGGATGGG AAGTACTGGC GATTCTCTGA GGGCAGGGGG AGCCGGCCGC AGGGCCCCTT 

1681 CCTTATCGCC GACAAGTGGC CCGCGCTGCC CCGCAAGCTG GACTCGGTCT TTGAGGAGGC . 

1741 GCTCTCCAAG AAGCTTTTCT TCTTCTCTGG GCGCCAGGTG TGGGTGTACA CAGGCGCGTC 

1801 GGTGCTGGGC CCGAGGCGTC TGG ACAAGCT GGGCCTGGGA GCCGACGTGG CCCAGGTGAC 

1861 CGGGGCCCTC CGGAGTGGCA GGGGGAAQAT GCTGCTGTTC AGCGGGCGGC GCCTCTGGAG 

1921 GTTCGACGTG AAGGCGCAGA TGGTGGATCC CCGGAGCGCC AGCGAGGTGG ACCGGATGTT . 

1981 CCCCGGGGTG CCTTTGGACA CGCACGACGT CTTCCAGTAC CGAGAGAAAG CCTATTTCTG 

2 041 CCAGGACCGC TTCTACTGGC GCGTGAGTTC CCGGAGTGAG TTGAACCAGG TGGACCAAGT 

2101 GGGCTACGTG ACCTATGACA TCCTGCAGTG CCCTGAGGAC TAGGGCTCCC GTCCTGCTTT 

2161 GCAGTGCCAT GTAAATCCCC ACTGGGACGA ACCCTGGGGA AGGAGCCAGT TTGCCGGATA 

2221 CAAACTGGTA TTCTGTTCTG GAGGAAAGGG AGGAGTGGAG GTGGGCTGGG CCCTCTCTTC 

2281 TCACCTTTGT TTTTTGTTGG AGTGTTTCTA ATAAACTTGG ATTCTCTAAC CTTT 



Figure 1 
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HUMAN OSTEOCLAST-SPECIFIC AND 
•RELATED GENES 

RELATED APPLICATION 

5 

This application is a continuation of application Scr. No. 
08/045,270 filed on Apr. 6, 1993 now abandoned. 

BACKGROUND OF THE INVENTION 

10 

Excessive bone resorption by osteoclasts contributes to 
the pathology of many human diseases including arthritis, 
osteoporosis, periodontitis, and hypercalcemia of malig- 
nancy. During resorption, osteoclasts remove both the min- 
eral and organic components of bone (Blair, R C, et al., /. 1S 
Cell Biol. 102:1164 (1986)). The mineral phase is solubi- 
lized by acidification of the sub-osteoclastic lacuna, thus 
allowing dissolution of hydroxyapalite (Vacs, G., Clin. 
Orthop. Relat 231:239 (1988)). However, the mechanism(s) 
by which type I collagen, the major structural protein of ^ 
bone, is degraded remains controversial. In addition, the 
regulation of osteoclastic activity is only partly understood. 
The lack of information concerning osteoclast function is 
due in pan to the fact that these cells are extremely difficult 
to isolate as pure populations in large numbers. Furthermore, 25 
there are no osteoclastic cell lines available. An approach to 
studying osteoclast function that permits the identification of 
heretofore unknown osteoclast- specific or -related genes and 
gene products would allow identification of genes and gene 
products that are involved in the resorption of bone and in jq 
the regulation of osteoclastic activity. Therefore, identifica- 
tion of osteclast- specific or -related genes or gene products 
would prove useful in developing therapeutic strategies for 
the treatment of disorders involving aberrant bone resorp- 
tion. 
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SUMMARY OF THE INVENTION 



The present invention relates to isolated DNA sequences 
encoding all or a portion of osteoclast-specific or -related ^ 
gene products. The present invention further relates to DNA 
constructs capable of replicating DNA encoding osteoclast- 
specific or -related gene products. In another embodiment, 
the invention relates to a DNA construct capable of directing 
expression of all or a portion of the osteoclast-specific or A$ 
-related gene product in a host cell. 

Also encompassed by the present invention are prokary- 
otic or eukaryotic cells transformed or transfected with a 
DNA construct encoding all or a portion of an osteoclast- 
specific or -related gene product. According to a particular 50 
embodiment, these cells are capable of replicating the DNA 
construct comprising ' the DNA encoding the osteoclast- 
specific or -related gene product, and, optionally, are capable 
of expressing the osteoclast-specific or -related gene prod- 
uct. Also claimed are antibodies raised against osteoclast- 55 
specific or -related gene products, or portions of these gene 
products. 

The present invention further embraces a method of 
identifying osteoclast-specific or -related DNA sequences 
and DNA sequences identified in this manner. In one 60 
embodiment, cDNA encoding osteoclast is identified as 
follows: First, human giant cell tumor of the bone was used 
to 1) construct a cDNA library; 2) produce ^P-labelled 
cDNA to use as a stroma] cell*; osteoclast"*" probe, and 3) 
produce (by culturing) a stromal cell population lacking 65 
osteoclasts. The presence of osteoclasts in the giant cell 
tumor was confirmed by histological staining for the osteo- 



clast marker, type 5 tartrate-resistant acid phosphatase 
(TRAP) and with the use of monoclonal antibody reagents. 

The stromal cell population lacking osteoclasts was pro- 
duced by dissociating cells of a giant cell tumor, then 
growing and passaging the cells in tissue culture until the 
cell population was homogeneous and appeared fibroblastic. 
The cultured stroma] cell population did not contain osteo- 
clasts. The cultured stromal cells were then used to produce 
a stromal cell* osteoclast - 32 P-labelled cDNA probe. 

The cDNA library produced from the giant cell tumor of 
the bone was then screened in duplicate for hybridization to 
the cDNA probes: one screen was performed with the giant 
cell tumor cDNA probe (stromal cell*, osteoclast*), while a 
duplicate screen was performed using the cultured stromal 
cell cDNA probe (stromal cell*, osteoclast"). Hybridization 
to a stromal*, osteoclast* probe, accompanied by failure to 
hybridize to a stromal*, osteoclast" probe indicated that a 
clone contained nucleic acid sequences specifically 
expressed by osteoclasts. 

In another embodiment, genomic DNA encoding osteo- 
clast -specific or -related gene products is identified through 
known hybridization techniques or amplification techniques. 
In one embodiment, the present invention relates to a 
method of identifying DNA encoding an osteoclast-specific 
or -related protein, or gene product, by screening a cDNA 
library or a genomic DNA library with a DNA probe 
comprising one or more sequences selected from the group 
consisting of the DNA sequences set out in Table I (SEQ ID 
NOs: 1-32). Finally, the present invention relates to an 
osteoclast-specific or related protein encoded by a nucle- 
otide sequence comprising a DNA sequence selected from 
the group consisting of the sequences set out in Table I, or 
their complementary strands. 

BRIEF DESCRIPTION OF FIG. 1 

The FIG. 1 shows cDNA sequence (SEQ ID NO: 33) of 
buman gelau'nase B, and highlights those portions of the 
sequence represented by the osteoclast-specific or -related 
cDNA clones of the present invention. 

DETAILED DESCRIPTION OF THE 
INVENTION 

As described herein. Applicant has identified osteoclast- 
specific or osteoclast-rclated nucleic acid sequences. These 
sequences were identified as . follows: Human giant cell 
tumor of the bone was used to 1) construct a cDNA library; 
2) produce 32 P-labelled cDNA to use as a stromal cell*, 
osteoclast*probe, and 3) produce (by culturing). a stromal 
cell population lacking osteoclasts. The presence of oste- 
clasts in the giant cell .tumor was confirmed by histological 
staining for the osteoclast marker, type 5 acid phosphatase 
(TRAP). In addition, monoclonal antibody reagents were 
used to characterize the multinucleated cells in the giant cell 
tumor, which cells were found to have a phenotype distinct 
from macrophages and consistent with osteoclasts. 

The stroma] cell population lacking osteoclasts was pro- 
duced by dissociating cells of a giant cell tumor, then 
growing the cells in tissue culture for at least five passages. 
After five passages the cultured cell population was homo- 
geneous and appeared fibroblastic. The cultured population 
contained no multinucleated cells at this point, tested nega- 
tive for type 5 arid phosphatase, and tested variably alkaline 
phosphatase positive. That is, the cultured stromal cell 
population did not contain osteoclasts. The cultured stromal 
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cells were then used to produce a stromal cell*, osteoclast" 
32 P-labellcd cDNA probe. 

The cDNA iibrary produced from the giant cell tumor of 
the bone was then screened in duplicate for hybridization to 
the cDNA probes: one screen was performed with the giant 5 
cell tumor cDNA probe (stromal celT, osteroclasO, while a 
duplicate screen was performed using the cultured stromal 
cell cDNA probe (stromal celP osieocIasD Clones that 
hybridized to the giant eel! tumor cDNA probe (stromal", 
osteoclast*), but not to. the stromal cell cDNA probe (stro- 10 
mar, osteoclast - )! were assumed to contain nucleic acid 
sequences specifically expressed by osteoclasts. 

As a result of the differential screen described herein, 
DNA specifically expressed in osteoclast cells characterized 
as described herein was identified. This DNA, and equiva- 15 
lent DNA sequences, is referred to herein as osteoclast* 
specific or osteoclast-related DNA. Osteoclast-specific or 
-related DNA of the present invention can be obtained from 
sources in which it occurs in nature, can be produced 
recombinantly or synthesized chemically; it can be cDNA, 20 
genomic DNA, recombinantly-produced DNA or chemi- 
cally-produced DNA. An equivalent DNA sequence is one 
which hybridizes, under standard hybridization conditions, 
to an osteoclast-spccific or -related DNA identified as 
described herein or to a complement thereof. 25 

Differential screening of a human osteoclastoma cDNA 
library was performed to identify genes specifically 
expressed in osteoclasis. Of 12,000 clones screened, 195 
clones were identified which are either uniquely expressed ^ 
in osteoclasts, or are osteoclast-related. These clones were 
further identified as osteoclast-specific, as evidenced by 
failure to hybridize to mRNA derived from a variety of 
unrelated human cell types, including epithelium, fibro- 
blasts, lymphocytes, myelomonocytic cells, osteoblasts, and 35 
neuroblastoma cells. Of these, 32 clones contain novel 
cDNA sequences which were not found in the GenBank 
database. 

A large number of cDNA clones obtained by mis proce- 
dure were found to represent 92 kDa type IV collagenase A0 
(gelatinase B; E.C. 3.4.24.35) as well as tartrate resistant 
acid phosphatase. In situ hybridization localized mRNA for 
gelatinase B to multinucleated giant cells in human osteo- 
clastomas. Gelatinase B immunoreactivity was demon- 
strated in giant cells from 8/8 osteoclastomas, osteoclasts in 45 
normal bone, and in osteoclasts of Paget' s disease by use of 
a polyclonal antisera raised against a synthetic gelatinase B 
peptide. In contrast, no immunoreactivity for 72 kDa type IV 
collagenase (gelatinase A; EC 3.4.24.24), which is the 
product of a separate gene, was detected in osteoclastomas 50 
or normal osteoclasts. J 

The present invention has utility for the production and 
identification of nucleic acid probes useful for identifying 
osteoclast-specific or -related DNA. Osteoclast-spccific or 
-related DNA of the present invention can be used to 55 
produce osteoclast-specific or -related gene products useful 
in the therapeutic treatment of disorders involving aberrant 
bone resorption. The osteoclast-specific or -related 
sequences are also useful for generating peptides which can 
then be used to produce antibodies useful for identifying 60 
osteoclast-specific or -related gene products, or for altering 
the activity of osteoclast-specific or -related gene products. 
Such antibodies are referred to as osteoclast-specific anti- 
bodies. Osteoclast-spccific antibodies arc also useful for 
identifying osteoclasts. Finally, osteoclast -specific or -re- 65 
latcd DNA sequences of the present invention arc useful in 
gene therapy. For example, they can be used to alter the 



expression in osteoclasis of an aberrant osteoclast -specific 
or -related gene product or to correct aberrant expression of 
an osteoclast-specinc or -related gene product The 
sequences described herein can further be used to cause 
osteoclast-specific or related gene expression in cells in 
which such expression does not ordinarily occur, i.e., in cells 
which are not osteoclasts. 

Example 1— Osteoclast cDNA Libary Construction 

Messenger RNA (mRNA) obtained from a human osteo- 
clastoma (*giant cell tumor of bone*), was used to construct 
an osteoclastoma cDNA library. Osteoclastomas are actively 
bone rcsorptive tumors, but are usually non-metastaric. In 
cryostat sections, osteoclastomas consist of -30% raulti- ' 
nucleated cells positive for tartrate resistant acid phos- 
phatase (TRAP), a widely utilized phenotypic marker spe- 
cific in vivo for osteoclasts (Minicin, Caldf. Tissue Int. 
34:285-290 (1982)). The remaining cells are uncharactcr- 
ized 'stromal* cells, a mixture of cell types with fibroblastic/ 
mesenchymal morphology. Although it has not yet been 
definitively shown, it is generally held that the osteoclasts in 
these tumors are non-transformed, and are activated to 
resorb bone in vivo by substance(s) produced by the stromal 
cell element. 

Monoclonal antibody reagents were used to partially 
characterize the surface phenotype of the multinucleated 
cells in the giant cell tumors of long bone. In frozen sections, 
all multinucleated cells expressed CD68, which has previ- 
ously been reported to define an antigen specific for both 
osteoclasts and macrophages (Horton, M. A. and M H. 
Helfrich, In Biology and Physiology of the Osteoclast, B. R. 
Rifkin and C. V. Gay, editors, CRC Press, Inc. Boca Raton, 
Fla., 33-54 (1992)). In contrast, no staining of giant cells 
was observed for CDllb or CD 14 surface antigens, which 
are present on monocyte/macrophages and granulocytes 
(Arnaout, M. A. et al. J. Cell Physiol 137:305 (1988); 
Haziot. A. et al. / Immunol 141:547 (1988)). Cytocentri- 
fuge preparations of human peripheral blood monocytes 
were positive for CD68, CDllb, and CD14. These results 
demonstrate that the multinucleated giant cells of osteoclas- 
tomas have a phenotype which is distinct from that of 
macrophages, and which is consistent with that of osteo- 
clasts. 

Osteoclastoma tissue was snap frozen in liquid nitrogen 
and used to prepare poly A* mRA according to standard 
methods. cDNA cloning into a pcDNAII vector was carried 
out using a commercially-available kit (Librarian, InVitro- 
gen). Approximately 2.6x1 0 6 clones were obtained, >9S% 
of which contained inserts of an average length 0.6 kB. 

Example 2 — Stromal Cell mRNA Preparation 

A portion of each osteoclastoma was snap frozen in liquid 
nitrogen for mRNA preparation. The remainder of the tumor 
was dissociated using brief trypsinization and mechanical 
disaggregation, and placed into tissue culture. These cells 
were expanded in Dulbecco's MEM (high glucose, Sigma) 
supplemented with 10% newborn calf serum (MA Byprod- 
ucts), gentamycin (0.5 mg/ml),. 1-glutamine (2 mM) and 
non-essential amino acids (0.1 mM) (Gibco). The stromal 
cell population was passaged at least five times, after which 
it showed a homogenous, fibroblastic looking cell popula- 
tion that contained no multinucleated cells. The stromal cells 
were mononuclear, tested negative acid phosphatase, and 
tested variably alkaline phosphatase positive. These findings 
indicate that propagated stromal cells (i.c. stromal cells that 
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arc passaged in culture) are non-osteoclastic and non-acti- 
. vated. 

Example 3— Identification of DNA Encoding 
Osteoclastoma-Specific or -Related Gene Products 5 
by Differential screening of an Osteoclastoma 
cDNA Library 

A total of 12,000 clones drawn from the osteoclastoma 
cDNA library were screened by differential hybridization, 
using mixed M P labelled cDNA probes derived from (1) 
giant cell tumor mRNA (stromal ceir, OCT), and (2) mRNA 
from stromal cells (stromal cell*, OCT) cultivated from the 
same tumor. The probes were labelled with ^[PJdCTP by 
random priming to an activity of -KPCPM/ug. Of these 
12,000 clones, 195 gave a positive hybridization signal with 
giant cell (i.e., osteoclast and stromal cell) mRNA, but not 
with stromal cell mRNA. Additionally, these clones failed to 
hybridize to cDNA produced from mRNA derived from a 
variety of unrelated human cell types including epithelial ^ 
cells, fibroblasts, lymphocytes, myelomonocytic cells, 
osteoblasts, and neuroblastoma cells. The failure of these 
clones to hybridize to cDNA produced from mRNA derived 
from other cell types supports the conclusion that these 
clones are either uniquely expressed in osteoclasts, or are 
osteoclast-related. 

The osteoclast (OC) cDNA library was screened for 
differential hybridization to OC cDNA (stromal cell*, OC) 
and stromal cell cDNA (stromal cell 4 , OCT) as follows: 

NYTRAN filters (Schleicher & Schuell) were placed on 30 
agar plates containing growth medium and ampicillm. Indi- 
vidual bacterial colonics from the OC library were randomly 
picked and transferred, in triplicate, onto filters with prer- 
ulcd grids and then onto a master agar plate. Up to 200 
colonies were inoculated onto a single 90-mm niter/plate 35 
using these techniques. The plates were inverted and incu- 
bated at 37° C. until the bacterial inoculates had grown (on 
the filler) to a diameter of 0-5-1.0 mm. 

The colonies were then lysed, and the DNA bound to the 
filters by first placing the filters on top of two pieces of 40 
Whatman 3 MM paper saturated with 0.5N NaOH for 5 
minutes. The filters were neutralized by placing on two 
pieces of Whatman 3 MM paper saturated with 1M Tris- 
HCL, pH 8.0 for 3-5 minutes. Neutralization was followed 
by incubation on another set of Whatman 3 MM papers 4 $ 
saturated with 1M Tris-HCL, pH 8.0/1 .5M Nad for 3-5 
minutes. The filters were then washed briefly in 2xSSC 

DNA was immobilized on the filters by baking the filters 
at 80° C. for 30 minutes. Filters were best used immediately, 
but they could be stored for up to one week in a vacuum jar 50 
at room temperature. 

Filters were prehybridized in 5-8 ml of hybridization 
solution per filter, for 2-4 hours in a heal sealable bag. An 
additional 2 ml of solution was added for each additional 
filter added to the hybridization bag. The hybridization 



buffer consisted of 5xSSC, 5xDenhardt's solution, \% SDS 
and 100 ug/ml denatured heterologous DNA. 

Prior to hybridization, labeled probe was denatured by 
heating in lxSSC for 5 minutes at 100° C, then immediately 
chilled on ice. Denatured probe was added to the filters in 
hybridization solution, and the filters hybridized with con- 
tinuous agitation for 12-20 hours at 65° C 

After hybridization, the filters were washed in 2xSSC/ 
0.2% SDS at 50°-60° C. for 30 minutes, followed by 
washing in 0.2xSSO0.2% SDS at 60° C. for 60 minutes. 

The filters were then air dried and autoradiographed using 
an intensifying screen at -70° C overnight 

Example 4 — DNA Sequencing of Selected Clones 

Clones reactive with the mixed tumor probe, but unreac- 
b've with the stromal cell probe, are expected to contain 
either osteoclast-related, or in vivo 'activated' stromal -cell* 
related gene products. One hundred and forty-four cDNA 
clones that hybridized to tumor cell cDNA, but not to 
stromal cell cDNA, were sequenced by the dideoxy chain 
termination method of Sanger et al. (Sanger E, et al Proc. 
Natl Acad, Set USA 74:5463 (1977)) using sequenase (US 
Biochemical). The DNASIS (Hit&tctu) program was used to 
carry out sequence analysis and a homology search in the 
CenBank/EMBL database. 

Fourteen of the 195 tumor* stromal" clones were identi- 
fied as containing inserts with a sequence identical to the 
osteoclast marker, type 5 tartrate-resistant acid phosphatase 
(TRAP) (GenBank accession number J04430 M19534). The 
high representation of TRAP positive clones also indicates 
the effectiveness of the screening procedure in enriching for 
clones which contain osteoclast-specific or related cDNA 
sequences. 

Interestingly, an even larger proportion of the tumor 4 ' 
stromal - clones (77/195; 39 3%) were identified as human 
gelatinase B (macrophage-derived gelatinase) (Wflhelm, S. 
M.J. Biol Chan, 264:17213 ( 1989)), again indicating high 
expression of this enzyme by osteoclasts. TVenty-five of the 
gelatinase B clones were identified by dideoxy sequence 
analysis; all 25 showed 100% sequence homology to the 
published gelatinase B sequence (Genbank accession num- 
ber J05070). The portions of the gelatinase B cDNA 
sequence covered by these clones is shown in the FIGURE 
(SEQ ID NO: 33). An additional 52 gelatinase B clones were 
identified by reactivity with a 32 P-labelled probe for gelati- 
nase B. 

Thirteen of the sequenced clones yielded no readable 
sequence. A DNASIS search of GenBank/EMBL databases 
revealed that, of the remaining 91 clones, 32 clones contain 
novel sequences which have not yet been reported in the 
databases or in the literature. These partial sequences are 
presented in Table L Note that three of these sequences were 
repeats, indicating fairly frequent representation of mRNA 
related to this sequence. The repeat sequences are indicated 
by* * superscripts (Clones 198B, 223B and 32C of Table I). 



TABLE 1 



PARTIAL SEQUENCES OF 32 NOVEL OC-SPEOFIC OR -RELATED 
EXPRESSED GENES (cDNA CLONES) 



34A (SEQ ID NO: I) 
1 CCAAATATCT 
61 AATGTTTCTA 
121 GTGATATTCT 
4B (SEQ ID NO: 2) 
I GTCTCAACCT 



AAGTTTATTG 
GCCTTTTTTT 
CTTTGAATAA 

GCATXrCCTA 



CTTGC ATTTC 
ACTTTGTTTT 
ACCTATAATA 

AAAATCTCAA 



TAOTGAGAGC 
TATTGAAAAA 
CAAAATACCA 

AATOCTCCAT 



TGTTGAATTT 
TTTAATTATT 
GCACACAACA 

CrOOTTAATG 



GGTOAXGTCA 
TATGCTAJAG 



TCGOGGTAGG 
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TABLE I-continucd 



PARTIAL SEQUENCES OF 32 NOVEL OC-SPEOTIC OR -RELATED 
EXPRESSED GENES (cDNA CLONES) 



61 GGO 
12fl(SEQ CD NO: 3) 
1 CTTCCCTCTC 
61 CAGGCCCACA 
12! CAAOCAGCTG 
28B (S EQIP NO: 4) 
t TTTT ATTTGT 
61 GTGTGTTTTC 
12] AAAOCAAACT 
37B (SEQ ID NO: 5) 
1 GGCTGGACAT 
61 7TGCCCTGGC 
121 ACCCACTTTG 
181 ACAAAAAAAA 
55B (SEQ ID NO: 6} 
1 TTGACAAAGC 
61 AAGAGTAGTG 
121 TAATTTGCCT 
60B (SEQ ID NO: 7) 
1 GAAGAGAGTT 
61 GATCCCGAGG 
86B (SEQ ID NO: 8) 
1 GGATOGAAAC 
61 GCAAACCTGA 
121 TGGTTGCTGT 
87B (SEQ ED NO: 9) 
1 TTCTTGATCT 
61 TAGGAGCCGT 
181 CAATGATAAA 
933 (SEQ ID NO: 10) 
1 ACCCATTTCT 
61 CTCAAAGAAT 
121 GAATATGAGG 
HOB (SEQ ID NO: It) 
I ACATATATTA 
61 TAAAGTGCGA 

121 TAAcrrmT 

MSB (SEQ ID NO 12) 
1 CCAAATTTCT 
61 TITGACTACr 
133B (SEQ ID NO: 13) 
I AACTAACCTC 
61 CCTGAGCCAT 
121 AAAT 
140B (SEQ ID NO. 14) 
1 ATTATTATTC 
61 AAAACACACA 
121 GATAAACCCG 
I44B (SEQ ED NO: 15) 
1 CGTGACACAA 
61 AACAGCATGT 
198B* (SEQ ED NO. 16) 
1 ATAGGTTAGA 
61 ATCTGACTTC 
121 TCTACTCCAA 
181 ATGTGATTTG 
241 TTTAAT 
212B(SEQIDNO: 17) 
I GTCCAGTATA 
61 CCTCTAGATA 
121 AATGCCCTTC 
181 TCTGGAGC 
223B»(SEQIDN0: 18) 
1 GCACTTGGAA 
61 TGTTCAGTTT 
121 CCATGACCTT 
181 TAAGAGATGT 
241B (SEQIP NO: 19 ) 
1 _ TGTTAGTTTT 
61 CTAGACGTCC 
121 GGAAGGGCTC 
181 CTATATOAGC 
32C* (SEQ ID NO: 20) 
I CCTATTTCTG 
121 TCCGTCTACC 
161 GGGTGGAAGG 



TTGCTTCCCT 
CGGAGTACTG 
GTGGTGAATG 


TTCCCAAGCA 
OCAGACTACT 
CTGCCTGCCA 


GAGCTGCTCA 
GCTGATGTTC 
CGGGACCCCC 


CTCCATGGCC 
TCTTAAGGCC 

ccc 


ACCGCCACCA 
CAGGGAGTCT 


AAATATATGT 
GTCTTGCTTC 
CCCGGGATGO 


ATTACATCCC 
TTCATGGTCC 
AAGCAGATTA 


TAGAAAAAGA 
ATGATGCCAG 
TTCTGCCATT 


ATCCCAGGAT 
CTGAGGTTGT 
TTTCCAGGTC 


TTTCCCTCCT 
CAGTACAATG 
TTT 


CCGTGCOCTC 
CATGTCATCT 
TTAGGOGACG 
AAAAAAA 


CACGTCCCTC 
ACCTGGAGTG 
AnTCCCAGA 


ATATCCCCAG 

GGOCCTOCCCC 

CCACTCATCA 


GCACACTCTG 
TTCTTCAGCC 
CATTAAAAAA 


GCCTCAGGTT 
TTOAATCAAA 
.TATTTTOAAA 


TGTTTATTTC 
GCTATTATAT 
TC 


CACCAATAAA 
GGCGTATCAT 


TAGTATATGG 
GTTGATGCTC 


TGATTGGGGT 
ATAAATAGTT 


TTCTATTTAT ' 
CATATCTACT 


GTATGTACAA 
GAATT 


CCOCAACAGG 


CAAGGCAGCT 


AAATGCAGAG 


GGTACAGAGA 


ATGTAGAAGT 
GATTTCAGCA 
TGCACGTATC 


CCAGAGAAAA 

TAAAATCTTT 

AATACGTTAT 


ACAA3TTTAA 
AGTTAGAAGT 
C 


AAAAAGGTGG 
OAOAGAAAGA 


AAAAGTTACG 
AGACGGAGGC 


TTAGAACACT 
GCTnTGGAA 
ACTTGACAAA 


ATGAATAGGC 
TGCTTGAGTG 
A 


AAAAAAGAAA 
AGGAGCTCAA 


AAACTGTTCA 
CAAGTCCTCT 


AAATAAAATG 
CCCAAGAAAG 


AACAATTTTT 
AGAGGCAATA 
ACAAGCTCTA 


ACTGTAAAAT 
TATAGCCCAT 
GTGGTCATTA 


TT7TGGTCAA 
CTTACTAGAC 
AACCCCTCAG 


AGTTCTAAGC 
ATACAGTATT 
AA 


TTAATCACAT 
AAACTGGACT 


ACAGCATTCA 
ATGTATCAAG 
TTTTTACATT 


TTTGGCCAAA 

TATAGACTAT 

ATAAAATTAA 


ATCTACACGT 
GAAAGTGCAA 
CTTGTTT 


TTGTAGAATC 
ATAACAAGTC 


CTACTGTATA 
AAGGTTAGAT 


CTGGAATCCA 
CCAGC 


TCcrcccrcc 


CATCACCATA 


GCCTCGAGAC 


GTCATTTCTG 


CTCGGACCCC 
GCCCATCCCT 


TGCCrCACTC 
TATOAGCOGC 


ATTTACACCA 
OCAGTGATTA 


ACCAOCCAAC 
TAGCCTTTCG 


TATCTATAAA 
CTCTAAGATA 


1 1 ITJTIATG 
TCCCATTGAA 
GCACGTCCTG 


TTAGCTTAGC 
GGGTTTTGTA 
ATAGCAAATT 


CATGCAAAAT 
CATTTCAGTC 
C 


TTACTCGTGA 
CTTACAAATA 


AGCAGTTAAT 
ACAAAGCAAT 


ACATGCATTC 
TCATCAGCAG 


GTTTTATTCA 
GAAGCTGGCC 


TAAAACAGCC 
GTGGGCAGGG 


TGGTTTOCTA 
CCGCC 


AAACAATACA 


TTCTCATTCA 
TCACTTCCTA 
TTCATAAATC 
TCTTCCCTTC 


CGCGACTAGT 
AGTTCCCTCT 
TATTCATAAG 
TTTGCACTTT 


TAGCTTTAAG 
TATATCCTCA 
TCTTTGGTAC 
TRAAATAAAG 


CACCCTAGAG 
AGGTAGAAAT 
AAGTTACATG 
TATTTATCTC 


GACTAGGGTA 
GTCTATGTTT 
ATAAAAAGAA 
CTGTCTACAG 


AAGGAAAGCG 
AAACACCCGA 
TACACATTAG 


TTAAGTCGGT 
TTAACAGATG 
CTCCAGCTAA 


AAGCTAGAGG 

TTAACLTl'IT 

AAAGACACAT 


ATTGTAAATA 
ATGTTTTGAT 
TGAGAGCTTA 


TCTTTTATGT 
TTGCTTTAAA 
GAGGATAGTC 


CGGAGTTGCT 
CCCCATTTGT 
TTTCACTGTC 
GACTACACCC 


GTGCTATTTT 
TTGTGCTTCA 
CCCATCAAGG 
TCCCCCTGAC 


TGAAGCAGAT 
AATGATCCTT 
ACTTTCCrGA 
TG 


CTGGTGATAC 
CCTACTTTGC 
CAGCTTGTGT 


TGAOATTGTC 
TTCTCTCCAC 
ACTCTTAGGC 



TACGAAGGCC 
TATAGTTAGT 
TTTGCTAGTA 
ATAGTAAGGC 

ATCCTGACTT 

AGAGCGTGCA 

GGCAGGATTC 



TGTCTTCTGG 
CACTCGGGAT 
TCTCCATTTC 
TGT 

TGGACAAGGC 

CTTGTGATCC 

TCCAGCTGCT 



GAGTOAGGTT 
GGTGAAAGAG 
TAGAAGATGG 



CCTTCAGCCA 
TAAAATAAGC 
TTTGCATTTC 



TATTAGTCCA 

GGAGAAGAGG 

TTTAGATGAT 



OAAGACTGAC 
TTCATCTCCC 
TC I'lCL I AAA 



CTTCTTGGAG 
AAGCGCGAAG 
AACCACAGGT 



AAAGTCATCC 

GCTGTGCCTT 
TTTCATT 
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PARTIAL SEQUENCES OF 32 NOVEL OC-SPEOFIC OR -RELATED 
EXPRESSED GENES (cDNA CLONES) 



WC (SEQ ID NO: 21) 
1 COGAGO0TA0 
6i CCGCCCCCAC 
47C(SEQIDN0:22) 
1 TTAGTTCAGT 
61 CTGCCACCTG 
121 GGAGCTGACC 
65C (SEQ ID NO: 23) 
1 GCTGAATCTT 
61 TCCAAOTGTG 
121 AACTGCCCQT 
79C (SEQ ID NO: 24) 
1 GGCACTCGGA 
61 AGAAAACTCC . 
121 CATTGCCAAC 
84C (SEQ ID NO: 25) 
1 GCCAGGGOGG 
61 GACCTGCAGT 
121 OGTGCCFGAG 
86C (SEQ ID NO: 2 6) 
1 AACTCTTTCA 
61 GTTCATATCA 
121 TTCAATTATA 
S7C (SEQ ID NO: 27) 
1 GGATAAGAAA 
61 CGCAGCAGCC 
121 GTCCTGGTTG 
88C (SEQ ID NO: 28) 
I CTGACCTTCG 
61 TGTTCAACGG 
89C (SEQ ID NO. 29) 
1 ATOCCTGGCT 
61 T CCCTCA GTT 
121 TCGTTTTCTG 
101C(SEQIDNO:30) 
1 GGCTGGGCAT 
61 CTGCCAGCCC 
121 CGTTAGCTTT 
112C (SEQ ID NO: 31) 
1 CCAACTCCTA 
161 CAATACTCTC 
114C(SEQIDN0: 32) 
1 CATGGATGAA 



GTOTGTTTAT 
CCATCACCCC 

CAAAGCAGGC 
GCGAGGTTTC 
CAGAGTGGA 

TAAGAGAGAT 
AATTACGTGG 
TTAOAGTCCT 

TATGGAATCC 

GGAAACAAAG 

CTGGCCAGCT 

ACOGTCTTTA 
GGGOOCTAGT 
TAGAACTTGT 

CACTCrCGTA 
ATTCATATTG 
AGAATATATC 

GAAGGCCTGA 
CGCACAGGTT 
GCCGGTGGAG 

AOAOTTTGAC 
AGCCGTGAGC 

GTGGATAGTG 
TGGGAGTGTG 
GTGATGTTGT 

CCCTCTCCTC 
GGCTCTGAAO 
CCCATAAGGT 

CCGCGATACA 
CTAAAATAAA 

TGTCTCATGG 



TCCTGTACAA 
AGTGCAATGG 

AACCCCCTTT 
CCCAACACCC 



ATCATTACAA 
CTAGCTGCTG 

GGCACTGCTG 
TCCTCTGCTT 



A ACCAA GTCT 
GCCTTT 

CCACTGOGGT 
CCCTGTGTGT 



GGGGCAGTCA 



CATGGCGGTT 
CGGGGTCTCA 



TTTGGTCTTA 

TATm ATTSiTl ' 

JAIUUAJUljl 
CTTAATATTG 


AAGGCTTCAT 

ilA.1 lui J 1A 

ATGTCCTAAC 


CATOAAAGTO 

TTA ilTi A. Art 

ACTGGGTCTG 


TACATGCATA 
atttta r* a nr* A 

CTTATGC 


AGAAGGGAAA 

GATATATCCT 

TCCCCAAGAT 


CAAGCACTGO 
CATGGCTCGA 
GTOACTCCAG 


ATAATTAAAA 
AATAAGAACA 
CCAGAAA 


ACAGCTGGGG 
ACGCCTGTGG 


TTCCTCTCCT 
CATCTGTGGC 
TCTGGAATTC 


GCCTCAGAGG 
AGCGAAGGTG 
C 


TCAGGAAGGA 
AAGGGACTCA 


GGTCTGGCAG 
CLTTGTCGCC 


TTTTTAGTTT 

AGCTGTCTCA 

CTAATACTTT 


AACAATATAT 
TTCTTTTTTT 
TTAAAA 


GTGTTGTGTC 
AATGCTCATA 


TTGGAAATTA 
TACAGTAGTA 


GGCOAGGGG 
GAGAGGGGCA 
AGCCACAAAA 


CCGRGGCTGG 
CTTCCTCTTG 


CCiGCGTCTC 
CTTAGGTTGG 


AGTCCTGGGA 
TGAGGATCTG 


CTGGAGCCGG 
OACGACTCCG 


ATACCTACTG 
GTGGGGAAGT 


CCGCTATGAC 
TCTGCGGCGA 


TCGGTCAGCG 
T 


in nuioiA 
GAAGTACTAC 
GCTAACAATA 


GCAAATGCTC 
TTAACTGTCT 
AGAATAC 


CCTCCTTAAG 
GTCCTGCTTG 


GTTATAGGGC 
GCTGTOGTTA 


CTCCATCCCC 
CCAAGCGCCG 
TGGAGTATCT 


ATACATCACC 
TCCGTGCCAC 
GC 


AGGTCTAATG 
GGTGGCTCTG 


TTTACAAACG 
AGTATTCCTC 


GACCCACAGA 
CATGAAGCAC 


GTGCCATCCC 


TGAGAGACCA 


GACOGCTCCC 


TGGGAAGGAA 


CATGGTACAT 


TTC 





•Rrpe&isd 3 times 
^Repeated 2 limes 



Sequence analysis of the OC* stromal ceir cloned DNA 
sequences revealed, in addition to the novel sequences, a 45 
number of previously-described genes. The known genes 
identified (including type 5 acid phosphatase, gelatinase B, 
cystatin C (13 clones), Alu repeat sequences (11 clones), 
creamine kinase (6 clones) and others) are summarized in 
Table II. In situ hybridization (described below) directly jq 
demonstrated that gelatinase B mRNAis expressed in multi- 
nucleated osteoclasts and not in stromal cells. Although 
gelatinase B is a well -characterized protease, its expression 
at high levels in osteoclasts has not been previously 
described The expression in osteoclasts of cystatin C, a 55 
cysteine protease inhibitor, is also unexpected. This finding 
has not yet been confirmed by in 6itu hybridization. Taken 
together, these results demonstrate thai most of these iden- 
tified genes are osteoclast-exprcssed, thereby confirming the 
effectiveness of the differential screening strategy for idea- go 
tifying DNA encoding osteoclast-speciric or -related gene 
products. Therefore, novel genes identified by this method 
have a high probability of being OC-speciflc or related. 

In addition, a minority of the genes identified by this 
screen are probably not expressed by OCs (Table II). For 65 
example, type III collagen (6 clones), collagen type I (1 
clone), dermatansulfate (1 clone), and type VI collagen (1 



clone) are more likely to originate from the stromal cells or 
from osteoblastic cells which are present in the tumor. These 
cDNA sequences survive the differentia) screening process 
either because the cells which produce them in the tumor in 
vivo die out during the stromal cell propagation phase, or 
because they stop producing their product in vitro. These 
clones do not constitute more than 5-10% of the all 
sequences selected by differential hybridization. 

TABLE H 

SEQUENCE ANALYSIS OF CLONES ENCODING KNOWN 
SEQUENCES FROM AN OSTEOCLASTOMA cDNA 

LIBRARY 



Clones with Sequence Homology 


23 total 


to Collagenase Type IV 




Clones with Sequence Homology to 


14 total 


Type 5 Tartrate Resistant Acid Phosphatase 




Clones with Sequence Homology to 


13 total 


Cyttann C: 




Clones with Sequence Homology to 


11 total 


Ahwtpeat Seqocnm 




Clones with Sequence Homology to 


6 total 


Qealmne Kinase 




Cooes with Sequence Homology to 


6 total 
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SEQUENCE ANALYSIS OF CLONES ENCODING KNOWN 
SEQUENCES FROM AN OSTEOCLASTOMA cDNA 
LIBRARY 5 



Type ID Collagen 

Clones with Sequence Homology to 
MHC Class 1 7 Invariant Chain 
Clones with Sequence Homology to 
MHC Class II 0 Chnin* 

One or Two Qonc(i) with Sequence Homology to Each 

of the Following: 

al collagen type 1 

7 interferon inducible protein 

ostcoponb'n 

KufCkfiA cfrfflfi rirni tin^fc n^ttgn^ nlrp tn 
a globtn 

P glucosidase/sphrogoKpid activator 
Human CAPL protein (Ca binding) 
Human EST 01024 
Type VI eoUngeo 
Human EST 00553 



5 total 
3 total 
10 total 



UTP digoxygenin labelled cRNA probes. 

TABLE m 

h Situ HYBRIDIZATION USING PROBES 
DERIVED FROM NOVEL SEQUENCES 

Reactivity with: 



Clone 



Osteoclasis 



Stromal Cells 



10 



20 



4B 


+ 


+ 


29B* 


+ 




37B 




+ 


86B 






STB 






S8C 


+ 


+ 


98B 


+ 


+ 


U8B» 






140B» 


+ 




198B* 


+ 




212B* 


+ 




Gelatinise B* 


■ + 





40 



Example 5— In situ Hybridiation of OC-Exprcsscd 
Genes 

In situ hybridization was performed using probes derived 23 
from novel cloned sequences in order to determine whether 
the novel putative OC-specific or -related genes are differ- 
entially expressed in osteoclasts (and not expressed in the 
stromal cells) of human giant cell tumors. Initially, in situ . 
hybridization was performed using antisense (positive) and 30 
sense (negative control) cRNA probes against human type 
IV collagenase/gelatinase B labelled with 35 S-UTP. 

A thin section of human giant cell tumor reacted with the 
antisense probe resulted in intense labelling of all OCs, as 
indicated by the deposition of silver grains over these cells. 35 
but failed to label the stromal cell elements. In contrast, only 
minimal background labelling was observed with the sense 
(negative control) probe. This result confirmed that gelati- 
nase B is expressed in human OCs. 

In situ hybridization was then carried out using cRNA 
probes derived from 11/32 novel genes, labelled with 
digoxigenin UTP according to known methods. 

The results of this analysis are summarized in Table III. 
Clones 28B, 118B, MOB, 198B, and 212B all gave positive 45 
reactions with OCs in frozen sections of a giant cell tumor, 
as did the positive control gelau'nase B. These novel clones 
therefore are expressed in OCs and fulfill all criteria for 
OC-relatcdness. 198B is repeated three times, indicating 
relatively high expression. Clones 4B, 37B, 88C and 98B 
produced positive reactions with the tumor tissue; however 
the signal was not well-localized to OCs. These clones are 
therefore not likely to be useful and are eliminated from 
further consideration. Clones 86B and 87B failed to give a 
positive reaction with any cell type, possibly indicating very 55 
low level expression. This group of clones could still be 
useful but may be difficult to study further. The results of this 
analysis show that 5/1 1 novel genes are expressed in OCs, 
indicating that -50% of novel sequences likely to be OC- 
related. 

To generate probes for the in situ hybridizations, cDNA 
derived from novel cloned osteoclast- specific or -related 
cDNA was subcloned into a BlueSoipt II SK(-) vector. The 
orientation of cloned inserts was determined by restriction 
analysis of subclones. The T7 and T3 promoters in the 65 
BlueScriptn vector was used to generate ^-labelled ( M S- 
UTP 850 Ci/nunol, Amershara, Arlington Heights, 111.), or 



50 



60 



•OC-cxpitwed, as indicated by reactivity with antiseiue probe and lack or 
reactivity with sense probe 00 OCs only. 

In situ hybridization was carried out on 7 micron cryostat 
sections of a human osteoclastoma as described previously 
(Chang. L.-C. et al. Cancer Res. 49:6700 (1989)). Briefly, 
tissue was fixed in 4% paraformaldehyde and embedded in 
OCT (Miles Inc., Kankakee, HI.). The sections were rehy- 
drated, postnxed in 4% paraformaldehyde, washed, and 
pretreated with 10 mM DTT, 10 mM iodoacetarnide, 10 mM 
N-ethylraaleimide and 0.1 triethano!aminc-HCL. Prehybrid- 
ization was done with 50% deionized forraamide, 10 mM 
Tris-HQ, pH 7.0, lx Dcnhardt's, 500 mg/ml tRNA, 80 
mg/ml salmon sperm DNA, 0.3M NaCl, mM EDTA, and 
100 mM DTT at 45° C for 2 hours. Fresh hybridization 
solution containing 10% dextran sulfate and li ng/ml 
33 S-labelled or digoxygenin labelled RNA probe was 
applied after heat denaruratioo. Sections were coversHpped 
and then incubated in a moistened chamber at 45°-50° C. 
overnight Hybridized sections were washed four times with 
50% formarnide, 2x SSC, containing 10 mM DTT and 0.5% 
Triton X-100 at 45° C Sections were treated with RNase A 
and RNase Tl to digest single-stranded RNA, washed four 
times in 2x SSC/10 mM DTT. 

In order to detect 33 S-labelling by autoradiography, slides 
were dehydrated, dried, and coaled with Kodak NTB-2 
emulsion. The duplicate slides were split, and each set was 
placed in a black box with desiccant, sealed, and incubated 
at 4° C for 2 days. The slides were developed (4 minutes) 
and fixed (5 minutes) using Kodak developer D19 and 
Kodak fixer. Hematoxylin and eosin were used as counter- 
stains. 

In order to detect digoxygenin-labelled probes, a Nucleic 
Acid Detection Kit (Boehringer-Mannheim, CaL #1175041) 
was used. Slides were washed in Buffer 1 consisting of 1 00 
mM Tris/150 mM NaCl, pH7.5, for 1 minute. 100 ul Buffer 
2 was added (made by adding 2 mg/ml blocking reagent as 
provided by the manufacturer) in Buffer 1 to each slide. The 
slides were placed on a shaker and gently swirled at 20° C 

Antibody solutions were diluted 1:100 with Buffer 2 (as 
provided by the manufacturer). 100 ul of diluted antibody 
solution was applied to the slides and the slides were then 
incubated in a chamber for 1 hour at room temperature. The 
slides were monitored to avoid drying. After incubation with 
antibody solution, slides were washed in Buffer 1 for 10 
minutes, then washed in Buffer 3 containing 2 mM lcvami- 
sole for 2 minutes. 

After washing, 100 ul color solution was added to the 
slides. Color solution consisted of mtroblueAetrazolium salt 
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(NBT) (1:225 diluUon) 4.5 pi, 5-brorao-4-cbloro-3-indolyI 
phosphate (1:285 dilution) 3.5 jJ, levamisole 0.2 mg in 
Buffer 3 (as provided by the manufacturer) in a total volume 
of 1 ml. Color solution was prepared immediately before 
use. 5 

After adding the color solution, the slides were placed in 
a dark, humidified chamber at 20° C for 2-5 hours and 
monitored for color development The color reaction was 
stopped by rinsing slides in TE Buffer. 

The slides were stained for 60 seconds in 0.259b methyl 
green, washed with tap water, then mounted with water-, 
based Pennount (Fisher). 
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Example 6 — Iinnumohistochenristry 

15 

Immunohistochemical staining was performed on frozen 
and paraffin embedded tissues as well as on cytospin prepa- 
rations (see Table IV). The following antibodies were used: 
polyclonal rabbit anti-human gelatinase antibodies; AbllO 
for gelatinase B; monoclonal mouse anti-human CD68 anti- 20 
body (clone KP1) (DAKO, Denmark); Mol (anu-CDllb) 
and Mo2 (anti-CD 14) derived from ATCC cell lines HB 
CRL 8026 and TIB 228/HD44. The anti-human gelatinase B 
antibody AbllO was raised against a synthetic peptide with 
the amino acid sequence EALMYPMYRFTEGPPLHK 25 
(SEQ ID NO: 34), which is specific for human gelatinase B 
(Corcoran, M. L. et al. I Biol Chem, 267:515 (1992)). 

Detection of the immunohistochemical staining was 
achieved, by using a goat anti-rabbit glucose oxidase kit 
(Vector Laboratories, Burlingame Calif.) according to the 30 
manufacturer^ directions. Briefly, the sections were rehy- 
drated and pretested with cither acetone or 0.1% trypsin. 
Normal goat scrum was used to block nonspecific binding. 
Incubation with the primary antibody for 2 hours or over- 
night (AbllO: 1/500 dilution) was followed by either a glu- 35 
cose oxidase labeled secondary anti-rabbit serum, or, in the 
case of the mouse monoclonal antibodies, were reacted with 
purified rabbit anti-mouse Ig before incubation with the 
secondary antibody. 

Paraffin embedded and frozen sections from osteoclasto- 40 
mas (GCT) were reacted with a rabbit antiserum against 
gelatinase B (antibody 110) (Corcoran, M. L. et al. / Biol 
chem. 267:5 1 5 (1992)), followed by color development with 
glucose oxidase linked reagents. The osteoclasts of a giant 
cell tumor were uniformly strongly positive for gelatinase B, 45 
whereas the stromal cells were unreactive. Control sections 
reacted with rabbit preimmune serum were negative. Iden- 
tical findings were obtained for all 8 long bone giant cell 
tumors tested (Table IV). The osteoclasts present in three out 
of four central giant cell granulomas (GCC) of the mandible 50 
were also positive for gelatinase B expression. These neo- 
plasms are similar but not identical to the long bone giant 
cell tumors, apart from their location in the jaws (Shafer, W. 
G. et al.. Textbook of Oral Pathology, W. B. Saunders 
Company. Philadelphia, pp. 144-149 (1983)). In contrast, 33 
the multinucleated cells from a peripheral giant cell tumor, 
which is a generally non-resorpti ve tumor of oral soft tissue, 



were unreactive with antibody (Shafer, W. G. et al., Text- 
book of Oral Pathology, W. B. Saunders Company, Phila- 
delphia, pp. 144-149 (1983)). 

Antibody 110 was also utilized to assess the presence of 
gelatinase B in normal bone (n=3) and in Paget's disease, in 
which there is elevated bone remodeling and increased 
osteoclastic activity. Strong staining for gelatinase B was 
observed in osteoclasts both -in normal bone (mandible of a 
2 year old), and in Paget's disease. Staining was again absent 
in controls incubated with prcimrnunc serum. Osteoblasts 
did not stain in any of the tissue sections, indicating that 
gelatinase B expression is limited to osteoclasts in bone. 
Finally, peripheral blood monocytes were also reactive with 
antibody 110 (Table IV). 

TABLE IV 

DISTRIBUTION OF GELATINASE B IN VARIOUS 
TISSUES 



Simple! ' 



Antibodies tested 
Ab 110 
gelatinase B 



GCT frozen 
(n = 2) 

giant edit 
stromal cells 
GCT paraffin 
(n-6) 

giant cells 
stromal cells 
central GCG 
tns4) 

pant ceUs 
stromal cells 
peripteral GCT 
(n-4) 

past ceHs 
stroma] cells 
Paget'* disease 
(o=l) 

osteoclast* 
osteoblasts 
normal bone 
(a = 3) 

osteoclasts 
osteoblasts 
monocytes 
(cytospin) 



+(%) 



Distribution of gelatinase B in multinucleated giant ceHs, osteoclasts, oitco- 
blasts and stromal ceQs to various tissues. In general, paraffin embedded 
tissues were used for these experiments; exceptions arc indicated 

Equivalents 

Those skilled in the art will recognize, or be able to 
ascertain using no more than routine experimentation, many 
equivalents to the specific embodiments described herein. 
Such equivalents are intended to be encompassed by the 
following claims. 



SEQUENCE LISTING 



( I ) GENERAL INFORMATION: 

(ill ) NUMBER OP SEQUENCES: 34 
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( 2 ) INFORMATION FOR SEQ ID NO: I: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 170 base pain 
( B ) TYPE: oudcic add 
( C ) STRANDEDNESS: douWc 
( D ) TOPOLOGY: lioctr 

( I t ) MOLECULE TYPE: DNA fceoenric) 

( i 1 )SEQUENC£DESCRIPTON:SEQIDNO:l: 
CCAAATATCT A AGTTT ATTC CTTOOATTTC T AG TC AC AG C TCTTG A ATTT GGTG ATGTCA 60 
AATOTTTCTA OOOTTTTTTT A OTTTGTTTT T ATTG A A AAA TT7AATTATT TATCCTATAG 120 
GTGAT ATTCT CTTTGAATAA ACCTATAATA GAAAATACCA GCACACAACA 170 

( 2 ) INFORMATION FOR SEQ ID NOi 

( i ) SEQUENCE CHARACTERISTICS: 
( A > LENGTH: 63 base pain 
( B ) TYPE: mxldc tdi 
( C ) STRANDEDNESS: double 
( D ) TOPOLOGY: tisctr 

( I i ) MOLECULE TYPE: DNA Csesomk) 

( x I ) SEQUENCE DESCRIPTION: SEQ ID NOO: 
CTCTCAACCT GCATATCCTA AAAATOTCAA AATGCTOCAT CTGGTT A ATG TCGGGGTAOC 60 
GGC 6 3 

( 2 ) INFORMATION FOR SEQ ID N0:3: 

( i ) SEQUENCE CHARACTERISTICS: 
{ A ) LENGTH: 163 bus f*n 
( B ) TYPE: tmdete «dd 
( C ) STRANDED NESS: doubfe 
( D ) TOPOLOOY: ttrrir 

( i t ) MOLECULE TYPE: DNA (fesomfc) 

( * i ) SEQUENCE DESCRIPTION: SEQ ID NO:3: 
CTTCCCTCTC TTGCT TCCCT TTCCCAACCA GACGTGCTC A CTCCATOGCC ACCGCCACCA 60 
CAGGCCCACA GGGAOTACTG CCAGACTACT CCTGATGTTC TCTTAAGGCC CAOOOAOTCT 120 
CAACCAOCTG GTGGTGA ATG CTGCCTGGCA CGGGACCCCC CCC 163 

( 3 ) INFORMATION FOR SEQ ID NO* 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 173 buc pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: double 
( 0 ) TOPOLOGY: linear 

( i i > MOLECULE TYPE: DNA Ucomnic) 

( x I > SEQUENCE DESCRIPTION: SEQ ID NO*: 
TTTTATTTGT A AAT AT ATGT ATTAC A TCCC TAG A AAA AG A ATCCC AGGAT TTTCCCTCCT 60 
OTGTCTTTTC OTCTTOCTTC TTCATGGTCC ATG A TGCC AG CTGACGTTOT C AG T A C A ATG 120 
AAA C C A A A C T GGCCGG ATGO AAGCAOATT A TTCTOCCATT TTTCCAGGTC TTT 173 

( 2 ) INFORMATION FOR SEQ ID N05: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 197 base pain 
( B ) TYPE: ftadck tod 
( C ) STRANDEDNESS: double 
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( D ) TOPOLOGY: Imcir 

( I t ) MOLECULE TYPE; DNA (genomic) 

( x i ) SEQUENCE DESCRIPTION: SEQ ID KCW: 
GOCTGOACAT GGGTCCCCTC CACGTCCCTC AT ATCCC C AG GCACACTCTG GCCTCAGGTT 6 0 

TTGCCCTCGC CATGTC ATCT ACCTGGAGTG OGCCCTCCCC TTCTTCAGCC T T O A A T C A A A 120 
AGCCACTTTG TTACGCGAGG ATTTCCCAGA CCACTCATCA CA TTA A A AAA TATTTTOAAA 110 
ACAAAAAAAA A A A A A A A 197 

( 2 ) INFORMATION FOR SEQ ID NO* 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 132 baac pain 
( B ) TYPE: nsckic acid 
( C } STRANDEDNESS: double 
( D ) TOPOLOGY: linear 

( I 1 ) MOLECULE TYPE: DNA (feacimc) 

(st) SEQUENCE DESCRIPTION: SEQ ID NO*: 
TTCACAAAGC TGTTTATTTC CACCAATAAA TAGTATATCG TOATTGGGGT TTCTATTTAT 60 
AAGAGTAOTG GCTATTATAT OOOGTATCAT OTTGATGCTC ATA A AT AGTT CATATCTACT 120 
TAATTTGCCT TC >32 

< 2 ) INFORMATION FOR SEQ ID NO:7: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 75 base pain 
( B ) TYPE: Eodck: mad 
( C ) STRANDEDNESS: Oouble 
( D ) TOPOLOGY: linear 

(it) MOLECULE TYPE: DNA fcenoraic) 

( i i ) SEQUENCE DESCRIPTION: SEQ ID NO 7: 
GAAGAGAGTT OTATGTACAA CCCCAACAGG CAAOOCAGCT AAATGCAGAO GGTACAGAGA 60 
GATCCCGAGG GAATT 7S 

( 2 ) INFORMATION FOR SEQ ID NO-.S: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 131 hue psn 
( B ) TYPE* wcleie add 
( C ) STRANDEDNESS: double 
( D ) TOPOLOGY: linear 

( I i ) MOLECULE TYPE; DNA Ucnomic) 

( * i ) SEQUENCE DESCRIPTION: 5EQ ID NO*: 

GOATGGAAAC ATOTAOAAOT CCAGAGAAAA ACAATTTTAA AAAAAOGTGO AAA AGTT A C G 60 

/ 

GCAAACCTGA CATTTCACCA TAAAATCTTT AGTTAGAAGT GAGAGAAAGA AOAGGGAGGC 120 

TGOTTGC TOT TCCACGTATC AATAGGTTAT C 131 

( 2 ) INFORMATION FOR SEQ ID NOS: 

C i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 141 bue pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNESS: double 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (jMOwk) 
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( x l ) SEQUENCE DESCRIPTION: SEQ tt> HOi* 

TTCTTOATCT TTACAACACT ATGAATAOOO AAA AA AG A A A AAACTGTTCA A A AT A A A ATG . «0 

TAGGAGCCGT OCTTTTGGAA TGCTTGAGTO AGGAGCTCAA CAACTCCTCT C CCA AG A A AG 120 

CAATGATAAA ACTTQACAAA A 141 

( 2 ) INFORMATION FOR SEQ ID NO: lft 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 162 buc pain 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: double 
( 0 ) TOPOLOOY: tor 

( i i ) MOLECULE TYPE DNA (genomic) 

(it) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
ACCCATTTCT AACAATTTTT ACTGTAAAAT TTTTGOTCAA AGTTCTAAGC TTAATCACAT 60 
CTC A A AG A A T AGAGGCAATA TATAOCCCAT CTTACTAOAC ATACAOTATT AAACTGGACT 120 
GAATATGAGG AC A AGCTCT A CTCGTCATTA AACCCCTCAG AA 1«2 

( 2 ) INFORMATION FOR SEQ ID NO:U: 

( 1 ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: H7 buc piira 
( B ) TYPE: Bttdoc icid 
( C ) STRANDEDNESS: doable 
( D ) TOPOLOGY: linear 

( i ! ) MOLECULE TYPE: DNA (genomic) 

( x i JSEQL'ENCE DESCRIPTION: SEQ ID NO: 11: 
ACATATATTA ACAGCATTCA TTTGCCCAAA ATCTACACGT TTGTAGAATC CTACTGTATA 60 
TAAAGTGGGA ATGTATCAAG TATACACTAT GAAAGTGCAA ATAACAAOTC A AOOTTAOAT 120 
TAACTTTTTT TTTTTACATT ATAAAATTAA CTTGTTT 152 

( 2 ) INFORMATION FOR SEQ ID NO: 12: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 75 buc ptin 
( B ) TYPE: nucleic Kid 
( C ) STRANDEDNESS: double 
( D ) TOPOLOGY: lirxir 

( i I ) MOLECULE TYPE: DNA (jenomic) 

( i I ) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
CCAAATTTCT CTGGAATCCA TCCTCCCTCC CATCACCATA GCCTCGAGAC GTCATTTCTG 60 
TTTGACTACT CCAGC 7 * 

( 2 ) INFORMATION FOR SEQ ID NO: 11 

.( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 124 buc pain 
( B ) TYPE: oneiric trid 
{ C ) STRANDEDNESS: dottle 
( D ) TOPOLOGY: bur 

( i i ) MOLECULE TYPE: DNA (jcnoimc) 

( i i ) SEQUENCE DESCRIPTION: SEQ ID NO". 13: 
AACTAACCTC CTCOGACCCC TGCCTCACTC ATTTACACCA ACCACCCAAC T ATCT ATA A A 60 
CCTGAGCCAT GGCCATCCCT T A TG AGCGGC CCAGTOATTA TAOGCTTTCO CTCTAAOATA 120 
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( 2 ) INFORMATION FOR SEQ ID NO:l* 

( [ ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 131 bus pain 
( B ) TYPE: nucleic acid 
( C ) STRANDEDNE55: double 
( D ) TOPOLOGY: liocir 

( I i ) MOLECULE TYPE: DNA (genomic) 

( I i ) SEQUENCE DESCRIPTION: SEQ ID NO: 14; 
ATTATTATTC TTTTTTTATG TTAGCT TAG C CATGCAAAAT TTACTOGTGA AGCAGTTAAT 
A AA A C AC A C A TCCCATTG A A GGGTTTTGTA C A TTTCAGTC CTTACA AATA AC A A AGC AAT 
GATAAACCCO GCACGTCCTG ATAGGAAATT C 



60 
1 20 
1 5 1 



( 2 ) INFORMATION FOR SEQ ID NO-.15: 

{ i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 10S bus pan 
( B ) TYPE: Buckie add 
( C ) STRANDED NESS: double 
( D) TOPOLOGY: Eaear 

( i i ) MOLECULE TYPE: DNA (jeoomic) 

( 2 i ) SEQUENCE DESCRIPTION: SEQ ID N0:15: 
CCTGACACAA AC ATGC ATTC OTTTTATTCA TAAAACACCC TGGTTTCCTA A A A C A AT A C A 
AACAGCATGT TCATCAOCAO GAAOCTGGCC GTCGGCACGG OOOCC 

< 2 ) INFORMATION FOR SEQ ID NO-.I6: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 246 base pain 
( B ) TYPE: iraelcic add 
( C ) STRAND ED NESS: double 
( D ) TOPOLOGY: linear 

( i I ) MOLECULE TYPE: DNA (genomic) 

( a i ) SEQUENCE DESCRIPTION: SEQ ID NO:16: 



6 0 
I 0 3 



ATAGGTT AO A 


TTCTCATTC A 


CGOGACTAGT 


TAGCT TT A AC 


CACCCTAGAO 


G ACTAGGGTA 


6 0 


ATCTGACTTC 


TCACTTCCTA 


ACTTCCCTCT 


T ATATC CTC A 


AGGT AGA AAT 


GTCT ATOTTT 


1 2 0 


TC TA CTC C A A 


TTC ATAAATC 


TATTCATAAG 


TCTTTGGTAC 


AAOTT AC ATG 


AT AA A A AGAA 


1 B 0 


ATGTOATTTO 


TCTTCCCTTC 


TTTGC A CTTT 


TGAAATAA AG 


TATTTATCTC 


CTOTCTACAG 


2 4 0 


TTTAAT. . 












2 4 6 



( 2 ) INFORMATION FOR SEQ ID NO: 17: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: US base pain 
f B ) TYPE: nucleic acid 
( C ) STRANDEDNE5S: double 
( D ) TOPOLOGY: linear 

( i i ) MOLECULE TYPE: DNA (tenemJc) 

( & 1 ) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GTC C ACT A T A AAOGAAAGCG TTAAGTCOOT AAGCTAGACC ATTGT A A ATA TCTTTTATGT 
CCTCTAGATA AAA C ACCCG A TT AACAG ATG TTAACCTTTT ATGTTTTG AT TTGCTTTAAA 
AATGGCCTTC TACACATTAO CTCCAGCTAA AAACACACAT TGAOACCTTA GAOOATAGTC 



6 0 
1 2 0 

1 to 
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TCTGGACC * 18 8 

( 2 ) INFORMATION FOR SEQ ID NO-.lt: 

( I ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 212 bac patn 
(B ) TYPE: mckfc acid 
( C ) STRANDEDNESS: doable 
( D ) TOPOLOOY; bs 

( I I ) MOLECULE TYPE: DMA (genomic) 

(si) SEQUENCE DESCRIPTION: SEQ ID NO:18: 

CCACTTCCAA OGGAGTTOOT OTGCT ATTTT TGAAOCAG AT GTGOTOATAC TOAGATTOTC 60 

TCTTCAOTTT CCCCATTTGT TTGTG CTTC A AATCATCCTT CCTACTTTGC TTCTCTCCAC 120 

CCATOACCTT TTTCACTGTO OCCATCAAOG ACTTTCCTGA C AGCTTGTGT ACTCTTAOGC 180 

TAAOAGATGT GACTACAGCC TGCCCCTOAC TO 212 

( 2 ) INFORMATION FOR SEQ B> NO: 19: 

( I ) SEQUENCE OURACTERISTX3: 
( A ) LENGTH: 203 bate pan 
~ { B ) TYPE; tucJric acid 
( C ) STRANDEDNESS: doable 
( D ) TOPOLOGY ltoctr 

( i i ) MOLECULE TYPE; DNA (tcaanic) 

( x i ) SEQUENCE DESCUTOIN: SEQ ID NC0 9: 

TGTTAGTTTT TAGGAAGGCC TOTCTTCTGG GAGTGAGGTT TATTAGTCCA CTTC TTOG AG 60 

CTAGACGTCC T AT AGTTAGT CACTGGGGAT GGTG A A AG AG GG AG A AG A GG AAGGGCGAAG 120 

GG A AGGGCTC TTTOCTAGTA TCTCCATTTC TAGAAGATGG TTTAG ATG AT AACCACAGGT 180 

CTATATGAGC AT AGT AAGGC TGT 203 

t 2 ) INFORMATION FOR SEQ ID NO£0t 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 177 hue pairs 
< B ) TYPE: «ddc Kid 
I C ) STRANDEDNESS: double 
( D ) TOPOLOGY. linear 

( i 1 ) MOLECULE TYPE: DNA (paomie) 

(si) SEQUENCE DESCRIPTION: ESQ (D NO20: 
CCTATTTCTG ATCCTOACTT TGG AC AAGGC CCTTCAGCCA GAAGACTGAC AAAGTCATCC 60 
TCCOTCTACC AOAGCGTGCA CTTGTOATCC TAAAATAAGC TTCATCTCCC GCTGTGCCTT .120 
GGGTGG A AGG GGCAGGATTC TCCAGCTGCT TTTOCATTTC TCTTCCTAAA TTTCATT 177 

( 2 ) INFORMATION FOR SEQ ID NO£fc 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 106 bue pain 
( B ) TYPE: osckk add 
( C ) STRANDEDNESS: doable 
( D ) TOPOLOGY tncar 

( i i ) MOLECULE TYPE: DNA (jcwnnic) 

( » 1 ) SEQUENCE DESCRIPTION: SEQ B> NO^l: 
CGCAGCGTAC GTGTGTTT AT TC CTGT AC A A ATCATTACAA AACCAACTCT OOOGCAOTCA 60 
CCOCCCCCAC CCATCACCCC AOTGC A ATG G CTAGCTCCTG GCCTTT 106 
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( 2 ) INFORMATION FOR SEQ ID NOai 

( i ) SEQUENCE CHARACTERISTICS; 

( A ) LENGTH: 139 base pun 
( B )TYP&BBcfck«cid 
( C } STRANDED NESS: docbfc 
( D ) TOPOLOGY: fewr 

( i I ) MOLECULE TYPE: DNA (genomic) 

( t I ) SEQUENCE DESCRIPTION: SEQ ID N052: 
TTAOTTCAOT CAAAOCAOOC AACCCCCTTT OOCACTGCTO CCACTGOOOT CATOOCCOTT 60 
GTGGC AO CTC CGG A G GTTTC CCCAACACCC TCC TCTGCTT CCCTGTCTGT CGOGCTCTCA 120 
GGAGCTOACC C A 0 A O TOO A 139 

( 2 ) INFORMATION FOR SEQ IDN023: 

( t ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 177 bus pm 
( B ) TYPE nucleic Kid 
( C ) STRANDED NESS: double 
( D ) TOPOLOGY: Eccar 

( i i ) MOLECULE TYPE: DNA (pacmic) 

(i i ) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
GCTOAATOTT TAAGAGAGAT TTTCGTCTTA AAGOCTTCAT CATOAAAGTG TACATGCATA 60 
TGC A AGTGTG AATTACGTGG TA TGG ATGGT TGCTTOT TTA TTAA CT A A A G ATCTACAGC A J 20 

AACTGCCCCT TTAOAGTCCT CTTAATATTO ATGTCCTAAC ACTGGGTCTG CTTATGC 177 

( 2 ) INFORMATION FOR SEQ ID N02*: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: W buc pain 
( B ) TYPE; ooc lex acid 
( C ) 5TRANDEDNES3: doable 
{ D ) TOPOLOGY: Ennr 

( I I ) MOLECULE TYPE; DNA (paoaiic) 

(si) SEQUENCE DESCRIPTION: SEQ ID N024: 
CCC AGTGGGA TATOOAATCC AGAAGGGAAA CAAOCACTGG ATAATTAAAA ACAOCTGCGG 60 
A G A A A A C TGG OG A A A C A A A G GATATATCCT C A TGGCTCG A A A T A AG A A C A ACGCCTGTCG 120 
CAT TGCC AAC CTGOCCAGCT TCCCCAAGAT GTGACTCCAG CCAGAAA 167 

( 2 ) INFORMATION FOR SEQ ID NO-.25: . 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: UI b*ic pain 
( B ) TYPE; mxkk mM 
( C ) STRANDEDNESS: double 
( D ) TOPOLOGY: liactr 

( i i ) MOLECULE TYPE: DNA {genemt) 

< i i > SEQUENCE DESCRIPTION: SEQ ID HQOS: 
GCC AGGGCGG ACCGTCTTTA TTCCTCTCCT CCCTCAGAGG TCAOGAAGGA CGTCTGG C A G 60 
0ACCTOCAGT GGGCCCTAOT CATCTGTOGC AGCG A AGGTG A AGGG AC T CA CCTTGTCGCC 120 
COTOCCTOAO TAOAACTTOT TCTGG A ATTC C 15 1 

( 2 ) INFORMATION FOR SEQ ID Nft2fc 

( » ) SEQUENCE CHARACTERISTICS: 
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( A ) LENGTH: 156 bus pun 
< B ) TYPE: rod ti c icid 
( C ) STRANDEDNESS: doable 
(D ) TOPOLOGY: fee* 

( i i ) MOLECULE TYPE: DNA Ucncmic) 

Cxi) SEQUENCE DESCRIPTION: SEQ ID KO^fi: 

AACTCTTTCA CACTCTOOTA TTTTTAOTTT AACAATATAT CTGTTCTOTC TTGGAAATTA 60 

OTTCATATCA ATTCATATTG ACCTGTCTCA TTCTTTTTTT A A TG G T C AT A TACAOTAOTA 120 

TTC A ATT ATA A G A A T A T ATC CTAATACTTT T T A A A A JJfi 

( 2 ) INFORMATION FOR SEQ ED NOtTT 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 150 base pen 
( B )TY7£: noetic add 
( C ) STRANDEDNE5S: double 
( D ) TOPOLOGY: linear 

< i i ) MOLECULE TYPE: DNA (jcosmie) 

< ■ i ) SEQUENCE DESCRIPTION: SEQ ID NOJ7: 

GGATA AGAAA G A AG GCCTG A GGOCTAGGGG CCGGGGCTGG CCTGCGTCTC AGTCCTGGGA 60 
CGCAGCAGCC CGCACAGOTT C AG AGGGGC A CTTCCTCTTG CTTAGGTTCG TGAGGATCTG 120 
GTCCTGGTTG GCCGGTGGAG AG C C A C A AA A 150 

( 2 ) INFORMATION FOR SEQ ID NOJi 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 212 base pain 
( B ) TYPE: nudrk tod 
( C ) STRANDEDNESS: double 
( D ) TOPOLOGY: bear 

( i i ) MOLECULE TYPE: DNA (teaomtc) 

( x i ) SEQUENCE DESCRIPTION: SEQ ID N02& 

GCACTTGGAA GGGAGTTGGT OTCCTATTTT TGAAGCAGAT GTGGTGATAC TGAOATTGTC 60 

TGTTCAGTTT CCCCATTTGT TTOTOCTTC A A A TGATCCT T CCTACTT TGC TTCTCTCCAC 120 

CCATGACCTT TTTCACTGTG GCCATCAAGG AC TTTCC TG A CACCTTOTC T ACTCTTAGGC I ft 0 

TAAGAGATGT GACTACAGCC TOCCCCTOAC TG 2 12 

( 2 ) INFORMATION FOR SEQ CD H0J9: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH' 157 base pain 
( B ) TYPE: nucleic add 
( C ) STRANDEDNESS: doable 
( D ) TOPOLOGY: Hnctr 

( I i ) MOLECULE TYPE: DNA tfeaonue) 

( i i ) SEQUENCE DESCRIPTION: SEQ ID N049: 

ATCCCTGGCT GTGOATAGTG CTTTTGTOTA GCAAATG CTC CCTCCTTAAC GTTATAGGGC 60 

TCCCTGAGTT TGGGAGTGTC C A A GT A C T A C TTA ACTGTCT CTCCTGCTTG G CTCTCOT T A 120 

TCOTTTTCTO OTGATGTTOT OCTAACAATA AG A AT A C 137 

( 2 ) INFORMATION FOR SEQ ID NO-JO: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 152 hue pain 
( B ) TYPE: ouclce add 
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( C ) STKANDEDNESS: doufe 
( D )TOPOLOOY: tn«x 

( i i ) MOLECULE TYPE: DNA (ynomic) 

( x I ) SEQUENCE DESCRIPTION: 5EQ ID NO-JO: 
CGCTGCOCAT CCCTCTCCTC CTCCATCCCC ATACATCACC ACGTCTAATC TTTACA AACG 
GTOCCAGCCC COCTCTGAAG CCAAGGOCCG TCCOTGCCAC OOTOOCTGTO AGTATTCCTC 
CGTTAOCTTT CCC AT A AGGT TOGAGTATCT GC 

( 2 ) INFORMATION FOR SEQ ID NOJ1: 

( i ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 90 base ptbx 
( B )TYPE: oadck mdi 
( C ) STRAMDEDNESS: doable 
( D ) TOPOLOGY: )Saur 

( t i ) MOLECULE TYPE: DNA (feoofmc) 

(at) SEQUENCE DESCRIPTION: SEQ ID N031: 
CCAACTCCTA CCGCGATACA G A C C C A C AG A GTCCCATCCC TG AG AG AC C A OACCOCTCCC 
CA A TACTCTC CTAAAATAAA CATGAAOCAC 

( 2 ) INFORMATION FOR SEQ ID N032: 

( i ) SEQUENCE CHARACTERISTICS: 
( A) LENGTH 43 hue part 
( B ) TYPE: nucleic acid 
( C ) 5TRANDEDNESS: double 
( D ) TOPOLOGY: Eacar 

( i i )MC*JECW£TYP&DNA(jwomk) 

( i i ) SEQUENCE DESCRIPTION: SEQ ID N032: 

CATGGATGAA TGTCTCATGG TCGGAAGGAA CATOOTACAT TTC 

( 2 ) INFORMATION FOR SEQ ID N033: 

( t ) SEQUENCE CHARACTERISTICS: 
( A ) LENGTH: 2333 base pain 
( B ) TYPE: nxktc actt 
( C ) STRANDED NESS: dosb* 
( D ) TOPOLOGY: linear 

( i t ) MOLECULE TYPE: DNA (genomic) " 

(si) SEQUENCE DESCRIPTION: SEQ ID N003: 

AG AC ACCTCT CCCCTCACCA TG ACCCTCTC GCAGCCCCTC GTCCTGGTOC TCCTOGTOCT 

OGGC TGCTGC TTTGCTGCCC CCAGACAOCG CCAGTCCACC CTTGTGCTCT TCCCTGGAGA 

C C TG A G A A C C AATCTCACCG ACACGCAGCT GGCAGAOOAA TACCTOTACC OCTATOOTTA 

CACTCGGGTG GCAGAGATGC GTOOAGAOTC GAAATCTCTG GGCCCTGCGC TGCTCCTTCT 

CCAGAAGCAA CTOTCCCTOC CCOAGACCGG TOAOCTOOAT AOCOCCACGC TOAAOOCCAT 

GCCAACCCCA CGCTOCCGGG TCCCAGACCT GGGCAGATTC CAAACCTTTG AOGGCG ACCT 

CAAOTCOCAC CACCACAACA TCACCTATTO OATCCAAAAC TACTCOGAAG ACTTGCCOCO 

GGCGGTG ATT OACGACGCCT TTGCCCGCGC CTTCGCACTG TGGACCGCGG TCACGCCOCT 

CACCTTCACT COCGTG T AC A GCCGGOACOC AG AC ATCGTC ATCCAGTTTG OTOTCGCGGA 

GC ACGGAG AC CCGTATCCCT T CG A C G GO A A GG A CGGGC T C CTGGCACACG CCTTTCCTCC 

TOOCCCCOOC ATTCAGGGAG ACOCCCATTT CGACGATGAC GAGTTGTOGT CCCTGGGCAA 



60 
I 2 0 
1 5 2 



60 
9 0 



4 3 



6 0 
I 2 0 

1 S 0 

2 4 0 

3 0 0 

3 6 0 

4 2 0 
4 t 0 
54 0 
6 00 
6 6 0 
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GOCCOTCCTG C T TCC A ACT C OGTTTGG AAA CGCAGATGGC GCGGCCTGCC ACTTCCCCTT 

CATCTTCGAC GOCCGCTCCT ACTCTCCCTC CACCACCGAC GGTCGCTCCG ACGGGTTCCC 

CTGGTGCAOT ACCACGGCCA ACT ACG A C AC CGACGACCGO TTTCGCTTCT CCCCCAOCOA 

OAGACTCTAC ACCCOOOACG GCAATOCTGA TOOOAAACCC TOCCAOTTTC CATTCATCTT 

CCA AGGCC A A TCCTACTCCG CCTGCACCAC GGA CGGTCGC TCCG ACGGCT ACCOCTGQTG 

CGCCACCACC GCCAACTACG ACCGGOA CA A GCTCTTCGGC TTCTGCCCOA CCCGAGCTGA 

CTCGACOOTG ATCGOOOOCA ACTCGGCGGG GGAGCTCTGC GTCTTCCCCT TCACTTTCCT 

GGGTA AGG AG TACTCGACCT GTACCAGCGA GGGCCGCGGA GATGGGCGCC TCTGOTGCGC 

TACCACCTCO A ACTTTGAC A GCGAC AAGAA OTOCGOCTTC TGCCCGOACC AAGGATACAG 

TTTGTTCCTC GTGOCOOCGC ATGAOTTCGG CCACGCOCTO GGCTT A GAT C ATTCCTC AGT 

GCCGGAGGCG CTCATCTACC CTATGT ACCG CTTCACTGAC GGGCCCCCCT TGCATAAOGA 

CGACGTGAAT OGCATCCGCC ACCTCTATGC TCCTCGCCCT C A ACCTG AO C CACGGCCTCC 

AACCACCACC ACACCGCACC CCACGCCTCC CCCGACGGTC TGCCCCACCG GACCCCCCAC 

TGTCCACCCC TCAGACCGCC CCACAGCTCC CCCCACAGGT CCCCCCTCAC CTGOCCCCAC 

AOOTCCCCCC ACTOCTGCCC CTTCTACGGC C ACTACTGTG CCTTTG AGTC CGGTGG ACGA 

TGCCTCC A AC GTG A AC ATCT TCG ACGCCAT CGCGGAGATT GGGAACCACC TGTATTTGTT 

CAAGGATGGG AAGTACTGGC CATTCTCTOA GOGCAOGGGG AGCCGGCCGC AGGGCCCCTT 

CCTTATCGCC G AC A AGTGGC CCGCGCTGCC CCCCAAGCTG CACTCGGTCT TTGAGG AGCC 

CCTCTCCAAG AAGCTTTTCT TCTTCTCTGG GCGCCAGGTG TGGGTGT A C A CAGGCG CGTC 

GGTOCTGOGC CCOAOGCCTC TGGACAAGCT GGGCCTGGG A GCCGACGTGG CCCAGGTGAC 

CGGGGCCCTC CGGAGTGGCA GGGGG A AG A T GCTGCTGTTC AGCGGGCCGC GCCTCTGOAG 

GTTCOACGTG AAGGCGCAGA TGGTGG ATCC CCGGAGCGCC AOCOAGOTGG ACCGGATGTT 

CCCCCCGGTG CCTTTGOACA CGCACG ACGT C T TCC AG T A C CG AO A G A A AG CCTATT TCTG 

CCAGGACCGC TTCT ACTGGC GCGTG AGTTC CCGGAGTG AG TTGAACCAGG TOGA CC A AOT 

GCGCTACCTG ACCTATOACA TCCTGCAGTG CCCTGAGGAC TAGGOCTCCC OTCCTGCTTT 

GCAGTG CC AT GTAAATCCCC ACTGGGACC A ACCCTGGGO A AOCAGCCAGT TTGCCGGATA 

CAAACTGGTA TTCTGTTCTG G AGG A AAGGG AGG AGTGG AG GTGGGCTGGG CCCTCTCTTC 

TCACCTTTOT TTTTTGTTGO AGTGTTTC T A A T A A AC TTG G ATTCTCTAAC CTTT 

( 2 ) INFORMATION FOR SEQ CD Nt>M: 

( i > SEQUENCE CHARACTERISTICS: 



7 20 
.710 
I 4 0 
9 0 0 
9 6 0 
10 2 0 
I 0 t 0 
114 0 
12 0 0 

12 4 0 

13 2 0 
1)10 
I 4 4 0 
15 0 0 

15 6 0 

14 2 0 

16 10 

17 40 

18 0 0 

18 60. 

19 2 0 
19 10 
2 0 4 0 
2 10 0 
2 16 0 
2 2 2 0 
2 2 8 0 
2 3 3 4 



( A ) LENGTH: 18 tmtco toll 
( B ) TYPE: tnino tad 
( C ) STRANDEDNESS: sta|lc 
( D ) TOPOLOGY: uknowo 



( i i ) MOLECULE TYPE; pcpddc 



( x 1 ) SEQUENCE DESCRIPTION: SEQ ID N044: 



Clti Alt Lev Met Tyr Pro Met Tyi Arf Pbe Thr Git G 1 y Pro Pro Lea 
1 ^ 10 ij 



His Ly t 



Wc claim: 



a) DNA sequences set forth in the group consisting of 
SEQ ID NOS. 12, 14, 16 and 17, or their complemen- 
tary strands; and 



1. An isolated osteoclast-specific or -related DNA 
sequence, or its complementary sequence, the DNA 65 
sequence comprising a nucleic acid sequence selected from 
the group consisting of: 
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b) DNA sequences which hybridize under standard con- 
ditions to the DNA sequences defined in a). 

2. A DNA construct capable of replicating, in a host cell. 
osteoclast-speci6c or -related DNA* said construct compris- 
ing: 

a) a DNA sequence of claim 1; and 

b) sequences, in addition to said DNA sequence, neces- 
sary for transforming or transfecting a host cell, and for 
replicating, in a host cell, said DNA sequence. 

3. A DNA construct capable or replicating and expressing, 
in a host cell, oiteoclast-specific or -related DNA, said 
construct comprising: 
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a) a DNA sequence of claim 2; and 

b) sequences, in addition to said DNA sequence, neces- 
sary for transforming or transfecting a host cell, arid for 
replicating and expressing, in a host cell, said DNA 
sequence. 

4. A cell stably transformed or transfected with a DNA 
construct according to claim 3. 

5. A cell stably transformed or transfected with a DNA 
construct according to claim 4. 
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1 CCCAGGGCGC CGTAGGCGGT GCATCCCGTT CGCGCCTGGG GCTGTGGTCT 
51 TCCCGCGCCT GAGGCGGCGG CGGCAGGAGC TGAGGGGAGT TGTAGGGAAC 
101 TGAGGGGAGC TGCTGTGTCC CCCGCCTCCT CCTCCCCATT TGCQCGCTCC 
151 CGGGACCATG TCCGCGCTGG CGGGTGAAGA TGTCTGGAGG TffTCCAGGCT 
201 GTGGGGACCA CATTGCTCCA AGCCAGATAT GGTACAQGAC TGTCAACGAA 
251 ACCTG6CAC6 QCTCTTQCTT CCGGTGAAAG TGATGCGCAG CCTGGACCAC 

301 CCCAATGTGC TCAAGTTCAT TGGTGTGCTG TACAAGGATA AGAAGCTGAA 
351 CCTGCTGACA GAGTACATTG AGGGGGGCAC ACTCAAGGAC TTTCTGCGCA 
401 GTATGGATCC GTTCCCCTGG CAGCAGAAQG TCAGGTTTGC CAAAGGAATC 
451 GCCTCCGGAA TGGACAAGAC TGTGGTBGTG GCAGACTTTG GGCTGTCACG 
501 GCTCATAGTG GAAGAGAGGA AAAGGGCCCC CATGGAGAAG GCCACCACCA 
551 AGAAACGCAC CTTGCGCAAG AACGACCGCA AGAAGCGCTA CACGGTGGTG 
601 GGAAACCCCT ACTGGATGGC CCCTGAGATG CTGAACGGAA AGAGCTATGA 
651 TGAGACGGTG GATATCTTCT CCTTTGGGAT CGTTCTCTGT GAGATCATTG 
701 GGCAGGTGTA TGCAGATCCT GACTGCCTTC CCCGAACACT GGACTTTQGC 
751 CTCAACGTGA AGCTTTTCTG GGAGAAGTTT GTTCCCACAG ATTGTCCCCC 
801 GGCCTTCTTC CCGCTGGCCG CCATCTGCTG CAGACTGGAG CCTGAGAGCA 
851 GACCAGCATT CTCGAAATTG GAGGACTCCT TTGAGGCCCT CTCCCTGTAC 
901 CTGGGGGAGC TGGGCATCCC GCTGCCTCCA GAGCTGGAGG AGTTGGACCA 
951 CACTGTGAGC ATGCAGTACG GCCTGACCCG GGACTCACCT CCCTAGCCCT 
1001 GGCCCAGCCC CCTGCAGGGG GGTGTTCTAC AGCCAGCATT GCCCCTCTGT 
1051 GCCCCATTCC TGCTGTGAGC AGGGCCGTCC GGGCTTCCTG TGGATTGGCG 
1101 GAATGTTTAG AAGCAGAACA AACCATTCCT ATTACCTCCC CAGGAGGCAA 
1151 GTQGGCGCAG CACCAGGGAA ATGTATCTCC ACAGGntTC GGGCCTAGTT 
1201 ACTGTCTGTA AATCCAATAC TTGCCTGAAA GCTGTGAAGA AGAAAAAAAC 
1251 CCCTGGCCTT TGGGCCAGGA GGAATCTGTT ACTCGAATCC ACCCAGGAAC 
1301 TCCCTGGCAG TGGATTGTGG GAGGCTCTTG CTTACACTAA TCAGCGTGAC 
1351 CTGGACCTGC TGGGCAGGAT CCCAGGGTGA ACCTGCCTGT GAACTCTGAA 
1401 GTCACTAGTC CAGCTGGGTG CAGGAGGACT TCAAGTGTGT GGACGAAAGA 
1451 AAGACTGATG GCTCAAAGGG TGTGAAAAAG TCAGTGATGC TCCCCCTTTC 
1501 TACTCCAGAT CCTGTCCTTC CTGGAGCAAG GTTGAGGGAG TAGGTTTTGA 
1551 AGAGTCCCTT AATATGTGGT GGAACAGGCC AGGAGTTAGA GAAAGGGCTG 
1601 GCTTCTUnT ACCTCCTCAC TGGCTCTAGC CAGCCCAGGG ACCACATCAA 
1651 TGTGAGAGGA AGCCTCCACC TCATGTTTTC AAACTTAATA CTGGAGACTG 
1701 GCTGAGAACT TACGGACAAC ATCCTTTCTG TCTGAAACAA ACAGTCACAA 
1751 GCACAGGAAG AGGCTGGGGG ACTAGAAAGA GGCCCTGCCC TCTAGAAAGC 
1801 TCAGATCTTC GCTTCTGTTA CTCATACTCG GG1GGGCTCC TTAGTCAGAT 
1851 GCCTAAAACA TTTTGGCTAA AGCTCGATGG GTTCTGGAGG ACAGTGTOGC 
1901 TTGTCACAGG CCTAGAGtCT GAGGGAGGGG AGTGGGAGTC TCAGCAATCT 
1951 CTTGGTCTTG GCTTCATGGC AACCACTGCT CACCCTTCAA CATGCCTGGT 
2001 TTAGGCAGCA GCTTGGGCTG GGAAGAGGTG GTGGCAGAGT CTCAAAGCTG 
2051 AGATGCTGAG AGAGATAGCT CCCTGAGCTG GGCCATCTGA CTrCTACCTC 
2101 CCATGTfTGC TCTCCCAACT CATTAGCTCC TGGGCAGCAT CCTCCTGAGC 
2151 CACATGTGCA GGTACTGGAA AACCTCCATC TTGGCTCCCA GAGCTCTAGG 
2201 AACTCTTCAT CACAACTAGA TTTGCCTCTT CTAAGTGTCT ATGAGCTTGC 
2251 ACCATATTTA ATAAATTGGG AATGGGTnG GGGTATTAAA AAAAAAAAAA 
2301 AAAAAAAAAA AAAAAAAAAA (SEQ ID N0:1) 

FIG.1A 
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FEATURES: 
5'UTR: 

Start Codon: 
Stop Codon: 
3'UTR: 



1-228 
229 
994 
997 



Homologous proteins: 

Tnp 10 BLAST Hits 



CRA 
CRA 
CRA 
CRA 
CRA 
CRA 
CRA 
CRA 
CRA 
CRA 



1000682328847 /altid=gi |8051618 /def=ref|NP_057952.1| LIM d 



18000005015874 /altid=gi 

88000001156379 /altid=gi 

88000001156378 /altid=gi 

18000005154371 /altid=gi 

18000005126937 /altid=gi 

18000005127186 /altid=gi 

18000005127185 /altid=gi 

18000005004416 /al tid=gi 

18000005004415 /altid=gi 



5031869 /def=ref 
7434382 /def=pir 
7434381 /def=pir 
7428032 /def=pir 
6754550 /def=ref 
2804562 /def=dbj 
2804553 /def=dbj 
2143830 7def=pir 



NP_005560.1I LIM 
X5814 LIM motif. 
X5813 LIM motif. 
JE0240 LIM kinas. 



KP_034848 
BAA24491.1 
BAA24489.1 



| LIM . 
(AB00. 
(AB00. 



1 178847 LIM motif. 



1708825 /def=sp|P53670|LIK2_RAT LI. 



R1AST dbEST hits: 



gi 
gi 
gi 
gi 
gi 
gi 
gi 



10950740 /datasetndbest /taxon=96... 
10156485 /dataset=dbest /taxon=96... 
5421647 /dataset=dbest /taxon=9606 ... 
10895718 /dataset=<lbest /taxon=96... 
13043102 /dataset=dbest /taxon=960... 
519615 /dataset=dbest /taxon=9606 /. . 
11002869 /dataset=dbest /taxon=96... 



Score 


E 


485 


e-136 


485 


e-136 


469 


e-131 


469 


e-131 


469 


e-131 


469 


e-131 


469 


e-131 


469 


e-131 


468 


e-131 


468 


e-131 


Score 


E 


1049 


0.0 


975 


0.0 


952 


0.0 


757 


0.0 


714 


0.0 


531 


e-149 


511 


e-143 



EXPRESSION INFORMATION FOR MODULATORY USE: 
library source: 
From BLAST dbEST hits: 



gi 
gi 
gi 
gi 
gi 
gi 



10950740 

10156485 

5421647 

10895718 

13043102 

519615 

11002869 



teratocarcinoma 
ovary 
testis 

nervous_normai 
bladder 
infant brain 
thyroid gland 



From tissue craning panels: 
Fetal whole brain 
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1 MVQDCQRNLA RLLLPVKVMR SLDHPNVLKF IGVLYKDKKL NLLTEYIEQG 
51 TLKDFLRSMD PFPWQQKVRF AKGIASGMDK TVVVADFGLS RLIVEERKRA 
101 PMEKATTKKR TLRKNDRKKR YTVVGNPYWM APEMLNGKSY DETVDIFSFG 
151 IVLCEIIGQV YADPDCLPRT LDFGLNVKLF WEKFVPTDCP PAFFPLAAIC 
201 CRLEPESRPA FSKLEDSFEA LSLYLGELGI PLPAELEELD HTVSMQYGLT 
251 RDSPP (SEQ ID N0:2) 

FEATURES: 

Functional domains and key regions: 

[1] PDX00004 PS00004 CAMP_PHOSPHO_SITE 

cAMP- and cGMP- dependent protein kinase phosphorylation site 

Number of matches: 2 

1 108-111 KKRT 

2 119-122 KRYT 

[2] PDOC00005 PS00005 PKC PH0SPH0_SITE 
Protein kinase C phosphoryTation site 

Number of matches: 4 

1 51-53 TLK 

2 106-108 TTK 

3 107-109 TKK 

4 111-113 TLR 

[3] PDOC00006 PS00006 CK2_PH0SPH0_SITE 
Casein kinase II phosphorylation site 

Number of matches: 4 

1 51-54 TLKD 

2 76-79 SGMD 

3 139-142 SYDE 

4 212-215 SKLE 

[4] PDOC00008 PS00008 MYRISTYL 
N-myristoylation site 

Number of matches: 4 

1 73-78 GIASGM 

FIG.2A 
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2 77-82 GMDKTV 

3 150-155 GIVLCE 

4 158-163 GQVYAD 

Membrane sp anning stri^tr'"^ and domains: 
Helix Begin End Score Certainty 

1 142 162 0.872 Putative 

2 184 204 0.652 Putative 

BLAST Alignment to Top Hit: 

>CRA| 1000682328847 /altid=gi| 8051618 /def^ref | NP_057952 . 1 J LIM 
domain kinase 2 Isoform 2b [Homo sapiens] /org-Homo 

sapiens /taxon=9606 /dataset=nraa /length=617 

Length = 617 
Score- 485 bits (1235). Expect - e-136 

Identities = 241/265 (90*). Positives = 241/265 (90*). Gaps = 22/265 (8*) 

Query 13 ULPvTO/MJBLDHPIMJOT^ 72 
L VKVMRSUDHPNVIJTIGVLYKDKKLNLLTEYIEGGTIJ^ 

Sbjct: 353 LTC/IO/MRSLDHPNvlKFm 412 

Ouerv 73 GIASGM- - — - - BICTVWADFGLSRLIVEERKRAPMEKATTKKR 110 

GIASGM " DJCTVWAOFGLSRLIVEERKRAPMEKATTKKR 

Sbjct: 413 GIASSiAYWSTCIIHJW^^ 

Query 111 TLPJCNDRKKRYTWGNPYWMAPEMDIG^ 170 

TU^DRKKRYTWGNPYWMAPEhOGKSYDETVTJIFSFGI VLCEI IGQv^ 
Sbjct: 473 TTJWNDflKKRYTWGNPYWM^^ 532 

Query 171 LDFGUWKLnOCFWTKPP 230 

UFGUmFWEKWTDCPPAFP^^ 
Sbjct: 533 LDFGLNViaFWEKFVPTDCPPAW 592 

Query: 231 PLPAELEELDHTVSMQYGLTRDSPP 255 

PLPAEl£ELDHTVSMQYGLTRDSPP 
Sbjct: 593 PLPAELEELDHTVSMQYGLTRDSPP 617 (SEQ ID N0:4) 

Hnmer search results (Pfam): . _ . 

Mode] Description — _ _ ?S r i f'S 1 ^ 2 

PF00069 Eukaryotic protein kinase domain 100.1 f 

CE00031 CE00031 VEGFR °- 14 J 

CE00204 CE00204 FIBROBLAST GROWTH RECEPTOR 4.7 1 1 

CE00359 E00359 bone morphogenetic"protein_receptor 1.8 7.9 1 

CE00022 CE00022 MAGUK_subfamily_d 1.5 2.5 1 

CE00287 CE00287 PTK Eph orphan_receptor -48.4 3-?e-05 } 

CE00292 CE00292 PTK membrane span -61.8 2.ie-os l 
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CE00291 CE00291 PTK_fgf_receptor 

CE00286 E00286 PTK_EGF_receptor 

CE00290 CE00290 PTK_Trk_family 

CE00288 CE00288 PTK_Insulin_receptor 



•113.0 
-125.1 
-151.3 
-210.4 



Parsed for domains: 

Model Domain seo-f seo-t h nn-f htrcn-t score E-yalue 



PF00069 
CE00022 
PF00069 
CE00031 
CE00204 
CE00359 
CE00290 
CE00287 
CE00291 
CE00292 
CE00288 
CE00286 



1/2 
1/1 
2/2 
1/1 
1/1 
1/1 
1/1 
1/1 
1/1 
1/1 
1/1 
1/1 



16 
124 
81 
129 
129 
79 
9 
1 
1 
1 
1 
6 



79 
153 . . 
156 

156 .. 
156... 

157 .. 
218 .. 
218 [. 
218 [. 
218 [. 
218 [. 
218 



41 
187 
129 
1114 
705 
287 
1 
1 
1 
1 
1 
1 



105 

216 .. 

182 .. 

1141 .. 

732 .. 

356 .. 

282 [] 

260 [] 

285 [] 

288 [] 

269 [] 

263 [] 



52. 
1. 
48. 
4. 
4. 
1. 
•151. 
-48. 
•113.0 
-61.8 
•210.4 
•125.1 



.1 
.5 
.0. 
.9 
.7 
.8 
.3 
.4 



3e-13 
2.5 
3.1e-12 
0.14 
1 

7.9 
6.5e-05 
3.8e-05 
0.027 
2.1e-05 
0.014 
0.0021 



0.027 
0.0021 
6.5e-05 
0.014 
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1 TCATCCTTGC GCAG6GGCCA TGCTAACCTT CTGT GTCTCA GTCCAATTTT 
51 AATGTATGTG CTGCTGAAGC GAGAGTACCA GAGGTTTTTT TGATGGCAGT 
101 GACTTGAACT TATTTAAAAG ATAAGGAGGA GCCAGTGAGG GAGAGGGGTG 
151 CTGTAAAGAT AACTAAMGT GCACTTCTTC TAAGAAGTAA GATGGAATGG 
201 GATCCAGAAC AGGGGTGTCA TACCGAGTAG GCCAGCGTTT GTTCCGTGGA 
251 CACTGGGGAG TCTAACCCAG AGCTGAGATA GCTTGCAGTG TGGATGAGCC 
301 AGCTGAGTAC AGCAGATAGG GAAAAGAAGC CAAAAATCTG AAGTAGGGCT 
351 GGGGTGAAGG ACAGGGAAGG GCTAGAGAGA CATTTGGAAA GTGAAACCAG 
401 GTGGATATGA GAGGAGAGAG TAGAGGGTCT TGATTTCGGG TCTTTCATGC 
451 TTAACCCAAA GCAGGTACTA MGTATGTGT TGATTGAATG TCTTTGGGTT 
501 TCTCAAGACT GGAGAAAGCA GGGCAAGCTC TGGAGGGTAT GGCAATAACA 
551 AGTTATCTTG AATATCCTCA TGGTGGAAAG TCCTGATCCT GTTTGAATTT 
601 TGGAAATAGA AATCATTCAG AGCCAAGAGA TTGAATTGTT GAGTAAGTGG 
651 GTGGTCAGGT TACAGACTTA ATTTTGGGTT AAAAAGTAAA AACAAGAAAC 
701 AAGGTGTGGC TCTAAAATAA TGAGATGTGC TGGGGGTGGG GCATGGCAGC 
751 TCATAAACTG ACCCTGAAAG CTCTTACATG TAAGAGTTCC AAAAATATTT 
801 CCAAAACTTG GAAGATTCAT TTGGATGTTT GTGTTCATTA AAATCTCTCA 
851 CTAATTCATT GTCTTGTCCA CTGTCCGTAA CCCAACCTGG GATTGGTTTG 
901 AGTGAGTCTC TCAGACTTTC TGCCTTGGAG TTTGTGAGAG AGATGGCATA 
951 CTCTGTGACC ACTGTCACCC TAAAACCAAA AAGGCCCCTC TTGACAAGGA 
1001 GTCTGAGGAT TTTAGACCCA GGAAGAATGA GTGATGGGCA TATATATATC 
1051 CTATTACTGA GGCATGAGAA GAGTGGAATG GGTGGGTTGA GGTGGTGTTT 
1101 TAAGGCCTCT TGCCAGCTTG TTTAACTCTT CTCTGGGGAA CGAGGGGGAC 
1151 AACTGTGTAC ATTGGCTGCT CCAGAATGAT GTTGAGCAAT CTTGAAGTGC 
1201 CAGGAGCTGT GCTTTGTCTA TTCATGGCCC CTGTGCCTGT GAAACAGGGT 
1251 TCGGTGACTG TCACTGTGCC TGTGGCAGTC TGTAGTTACC CAGAGAGAAC 
1301 AAAGCTGCAT ACACAGAGCG CACAAGGGAG TCTTGTAACA ACCTTGTCCT 
1351 GCTTTCTAGG GCTGAGTCAG GTACCACAGC TTGATCTCAG CTGTCCTCTT 
1401 TATTTCAAGA AGTTGACATC TGAGCCATAC CAGGAGTA7T GTATTTTGTT 
1451 TGAGGCCTCT CTTTTTGGAG GAACATGGAC CGACTCTGTG CTTTTGTCTA 
1501 TGCTGGTCTC TGAGCTCACA CAACCCTTCA CCCTCCTTTC TCAGCCAGTG 
1551 ATAGGTAAGT CTTCCCTATC TTGCAAGGCT CAGCTCAAGT GTCAGCTTCC 
1601 TCTACAAAGA CTTTCCTGGT TCCCCTCATT GGAGTGAACA AGAGTTGACA 
1651 TGGTAGAATG GAAAGAGCAG AAGCTTTAGA ATGAGCCAGA CCTGAGTATG 
1701 AATGCTAGAT CCACCACTTA GCTAGTCAAC CCTGCCCCCT GCCTCAAGTT 
1751 TTAATTTTCC TATCCATTAA GTGAATATM TAATACCTGT GTCACAGGAT 
1801 TATTTTGAGA ATTAAATGAG ATTAGGTCTA TGAAAGCACC TAGCAGAGTT 
1851 CTTGGCATAT AGGAGGCATT CATTAAATAT TTGTTCTTCC CCTTTTATAC 
1901 CCATWCTTT TCTTTTTCTG AACTAAAATA ATACTTGGTT CTATCTCT6A 
1951 AATAACATCC AAGTGAAAAA TCAACAACAT GAAAGAGCAG TTGTTTTCCA 
2001 GTGGATTTGC TTCTTAAGGA GCAGAGATTA TGTAATCTAA CAGCCTCCAA 
llll KSaga GCTTTGTATC TAGAACAGGG GTCCCCAGCC cctggaccgc 
2101 CAACTGGTAC GGGTCTGTAG CCTGTTAGGA ACCAGGCTGC ACAGCAGGA6 
2151 gtgagcggcg ggccagtgag cattgctgcc tgagctctgc ctcctgtcag 

2201 atcagtggtg gcattagah ctcataggag tgtgaaccct attgtgmct 

2251 GCACATGCAA GGGATCTGGG TTGCATGCTC CTTATGAGAA TCTCACTAAT 
2301 GGCTGATGAT CTGAGTTGGA ACAGTTTGAT ACCAAAACCA TCCCCCCGCC 
2351 CCCCAACCCC CAGCCTAGGG TCCGTGGAAA AATTGGCCCC TGCTGCCAAA 
2401 AAGGTTGAGG ACTGCTGATC TAGAGGACCA ATTTATTCAA TGTTGGTTGA 
2451 GTAAATGAGC TCTTGGATTA GGTGATGGAA AAATCTGAAA AAACAGGGCT 
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2501 TTTGAGGAAT AGGAAAAGGC AGTAACATGT TTMCCCAGA GAGAAGTTTC 
2551 TGGCTGTTGG CTGGGAATAG TCATAGGAAG GGCTGACACT GAAAAGAAGG 
2601 AGATTGTG7T CGTTTCTTCT TCTCAGAGCT ATAAGCAAAG GCTGAAAGTT 
2651 GTAGAAAAAG GCAAGTTTTG TTTCAGTAGA AAAAAGGATA ATCAGAACCA 
2701 TTTTTA6AAA ATGGAATGAG ACTACTTTTG AGGCCATGAG TTCCTTGTCC 
2751 CTGGAGAGAT GAGCAGAGGT TGGACAAGTG CTTACCAGAG ATCTTGTGGA 
2801 GGCAGAAACT GTGCATCTAG CAGAGCATTG GCCTAACCCT TTCAAATGAG 
2851 ATGCTGTTAA CTCAGTCTTA TTCTACATGG TAGGAATCCT GTCCCTTTGC 
2901 CTCCTGCTAC TTTGGGCCTC TCAACCTCTT GGTTTTGTGT GCAGGTGAAG 
2951 ATGTCTGGAG GTGTCCAGGC TGTGGGGACC ACATTGCTCC AAGCCAGATA 
3001 TGGTACAGGA CTGTCAACGA AACCTGGCAC GGCTCTTGCT TCCGGTAGGT 
3051 GGGCCTATCC TCCCATCTTT ACCAGTGTAC TATGGGCCAA gCACTATTTC 
3101 ATGTTCTGAT GGAAAACACA GAAACAAGCT TCTGAGTTGA GAATTTCAAT 
3151 CTTAGGGTGG GGAAAGGAAT GTACCAAGGA AGAGCTCATG ACCAAACCTC 
3201 AAGTGTGGCC CCCCTGAACC CAGGTTAAAT TGGAAGAGCC ATAAATGGGC 

3251 CAGCTGGAGG CAGGGTGGGG GGATGAGAGG AGCCCTTTCC AGGGTTGTCC 

3301 catatS ACTTTATGGG TGAGGAAACT GAGGCCCAGG AAGAGTGACT 

3351 TTCCTGTGGC TGCACTACAG ATTATGCAGG TACTTCAAGA GTTGTTTGTA 
3401 TTCTTATTTT ATTTTATTTT ATTTTATTTT ATTTTATTTT ATTTTATGAG 
3451 AGGGATTCTT GCTGTTGCCC AGGCTGGAGT GCAGTGGTGC AATCTCGGCT 
3501 WCTGCAATC TCTGCCTGCT GGGTTCAAGT GATTTTTCTG CCTTAGCTTC 
3551 CTGAGTAGCT GAGATGACAG GCACCTGCCA CCATGCGCAG CTAATTTTTG 
3601 TATTTTAGTG GAGACGGGGG TTTCAACATG TTGGTCAGGC TGGTCTTGAA 
3651 CTCCTGACCT CAAATGATGC ACCCACCTCG ACCTCCCAAA GTGCTGGAAT 
3701 TACAGGCGTG AACCAGTGTG CCCAGCCAAG AGTTGTTnT AGTGTGGTTG 
3751 GCAGAGCCAG CTCTTCCTTC ACCACAGGAT GCCTCCCTAG GTrCCTACTT 
3801 TTTGTTACTA GCTTTTATTA TAGCTATATT ATTATTAnA TTATTATTAT 
3851 TATTATTATT ATTATTGAGA CAGAGTCTCG CTCTGTCGCC CAGGCTGGTG 
3901 TACAGTGGTG CGATCCCGGG CTCACTGCAA CCTQGCCTC CCGAGTTCAA 
3951 GCAGTTCTCC TGCCTCAGCC CCCCGAGTAG GTGGGACTAC AGGCGCCTGC 
4001 CACCACACCC GGCTAATTTT TGTATTTTTA GTAGAGACGG GGTTTCACCT 
4051 TGTTGACCAG GCTGGTCTGG AGCTCCTGAC CTCAGGTAAG TGCTAGAATC 
4101 ACAGGCGTGA ACCACTGCGC CCAGCCAAGA GTTGTTTTTA GTGTGGTTGG 
4151 CAGAGCCAGC TCHCCTCAC CACAGG7TGC CTCCCTAGGT TCCTACTTTT 
4201 TGTTACTAGC TTTATTATAG CTACATTATT ATTATTATTG TTATTATTAT 
4251 TGAGACAGAG TCTCGCTCTG TCGCCCAGGC TGGTGTACAG T&ATGTGATC 
4301 TTGGCTCACT GCAACCTCTG CCCCCCGAGT TCAAGCAATT CTCCTGCTTC 
tm AGCCCCCCTA GTAGGTGGGA CTCCAGGCAC CTGCCACCAC GCCCAGCTAA 
4401 TTTTTGTATT TTTAGTAGAG GCGGGGTTTC ACCTTGTTGG CCAGGCTGGT 
4451 CTCAAACTCC TGACCTCAGG TGATCCGCCT GCCTCGGCCT CCCAAAATGT 
4501 TGGGATTACA GGCATGAGCC ACCGCGCCCT GCCTATAGCT AWTTATTTT 
4551 TGTAGGCAGC TCAGTTTCTT AAAAATTATA CAGACTTGAA ATCAGATTTG 
4601 TTCCTGCTGT CTGAGGCTCA GTTTCTTCAT CTGGAAAATG GAJGGTAATA 
4651 ATCTTGTTGA GATTGAATGA AATAATATAT GCAGTGTATC CAGTACATGG 
4701 TAGACACCCA GTGAATGGTT ATTCCTTCCT CCCATCGGAT TGGAATTCTC 
4751 AAGGGTGGGA ACTTGTCTTT ATATTCTTCA CAACGTAAAA TAGTTGMAT 
4801 TTGTTGGTGG AAAGAAGAGC AGTCCACTCC AGAGGCTGGA JjGGGCATGCC 
4851 TCGCCCCCAA GGTCTGAAGT GGTAGGGCTG TGCCTATATC CTGAGMTGA 
4901 GATAGACTAG GCAGGCACCT TGTGCTGTAG ATTCCAGCTC CTGCACATAG 
4951 CTCTTGTTGT AAAACATCCC TGTGCTTATA CCAAGTAATT GAGTTGACCT 

FIG.3-2 



U.S. Patent Jan.22,2002 Sheet 8 of 41 US 6,340,583 Bl 



5001 TTAAACACTT GCCTCTTCCC TGGGAACCAT ATAGGGGATT GGCCTGGAGA 
5051 CGTCTGGCCT CTG6AAGAGT TGGAAAGCAG CCATCATTAT TATCCTTTCC 
5101 TTTCAGCTAT AACTCAGAGC TCTCAAGTCT TTTCTGTGGA TCTTATTGCC 
5151 TTGG7TCTTG CCCCTTTTAC TCCCAGGGAA GTTGATTCTG TCTTTTCTGT 
5201 TCCATTTAGT ATGACAGGAG CAGAGAATGT CAGAGCTGTA AGGGACCTTA 
5251 TAGTTAAAGC CTTTGGCTGG TCCT7TCATT TTATAGCTGG GACTAATAAG 
5301 TAACGTCAAA ACCCAATGAG TTCACAGATT GGGTCTCGCC TTGGCATGTA 
5351 ACCCATATGT TCATATTCTT GCTGTTTTCC TATGTGTATG AATATTTTCT 
5401 ATCCAAAATA AGCAGGACAG GGTAGAGCM GTTAATCTTT GGAATTTCTG 
5451 GATTCTCTTA GAGCTAAAAA ACTTCAGAAC TAGAAGAAAC CACCCACTAT 
5501 ATGGTATAAC CCATTCATAT CACAGATGAG GCCTGAAACC AAAAAGACTT 
5551 GCTCAGGCCA TGGATGACAA GAGCTGGCCC TAGCACTGAA CTCTTGGGTC 
5601 ATTTGTAGGT CTAGTCAGAT GCTAGCTTGT TAGCTCTGTG CGTGCGTGTG 
5651 TGTGTGTGTG TGTGTGTGTG TGTGTGAGAT AGAGACAGAA AGATAACATA 
5701 TGTACACAAA TACATAAAGA GGAAGTAGAC ACGTTAGCAT GGTAGATAAG 
5751 AGTACAGGCA GGCCAGGCGT GGTGGCTCAC GCCTGTAATC CCAGCACTTT 
5801 GGGAGGCCAA GGCAGGTGGA TCACCTGAGG TCAGGAATTC GAGACCAGCC 
5851 TGACCAACAT GGTGAAACCC CATCTCTACT AAATACAGAA AAAAATTAGC 
5901 TTGGCATGGT GGCACATCCC TGTAATCCCA GCTACTTGGG AAGCTGAAGC 
5951 AGGAGAATCG CTTGAATCCG GGAAGCAGAA GTTGCAGTGA GCCGAGATTG 
6001 TGCCATTACA GTCTAGCCTG GGCAACAAGA GGGAAACTCC ATCGCAAAAA 
6051 AACAACCACC ACCAAGAGTA CAGGCTATGG AATGAGACTA TGGTTTTAAA 
6101 TCCTGGCTTT GCAATTTATT AACTAGCCTT MGTGACTTC CCTGAGCTTC 
6151 AGGCACCAAT CTGTAAAATG AGGATAAGAA TATTACTCAT GCCACATGGT 
6201 TGTTAGGGAG GATTAAATGT GATAACCTAT ATAAAGTGGC TAGCATAGCA 
6251 TCTCACATAT AGAAAACTCT TAATAGGGCe GGACGTGGTG GCTTATGCCT 
6301 GTAATCCTAG CACTCTGGGA GGCCGAGGCA GAAGGATCGC TTGAGCCCAT 
6351 GAGCCCAGGA GTTTGAGACC AGCCTGGCCA ACATGGCAAA ACTCCACCTC 
6401 TACAAAAAAT ACAAAAATAT TAGCCAGGCG TGATGGCACA CACCTGTAGT 
6451 CCCAGCTACT TGGGAAGCTG AGGAGCGATG ATTACCTGAG CCGAGGGATA 
6501 TCAAGGCTGT AGTGAGCTGT GATCATGCCA CTGTACTCCA TCCAGCTGGG 
6551 GGACAGAGTG AAACCCCTGT CTCAAAACAA AACAAATGAA AAAAAAAACC 
6601 CTTAATAATC AGTAACTGTC ACTTTATAn ATGTTGTGAG TGTGTGTCTA 
6651 TATACACCTA TATGTATACA TTTCTCTTAT TACACATTCA TTGGTGATCT 
6701 GATGTGGAGC CCCAGGGATT AAGGGCAACT TTGAACTACC CTGACACAAT 
6751 CAAGCCAAAT ATCATTCCCG TGGAGGAAGT AGAGTATCTA GGTTCTGTCT 
6801 CCTAGTTGCA GCTTTACCTT GAGGACAGAG ACTCTAATCC AGCTGTGCTG 
6851 AAGGAGCACA TCTCCTGACT TCTGAGCTTT CCCCTGGTAA ATTCAAACTG 
6901 GATGTCACGG CGCCCTCAGA TAGAGCCTGG TAATTTGCCC TGGGGAGAGT 
6951 GACTGTCTTT TGGATCTAAT TTGACTTTTG CCCCAGTTGG AGGAAAATCT 
7001 TCAGGGCTAG GAAGGATTGT ATTTGTCTGA CCCCAGAGAT AACCTGGGTT 
7051 TTGAGGAACA TGGGGCATCA ACCTGAATGG TCTTGTAAGA TCTCtCCCAC 
7101 GCCAGCTTGC CAGTGTTTCT CTGATGAATT TAGAGTACCT GAGTAGTGCA 
7151 GGCCTGCTGG GAGGAGGACT CTCCCTCTGT GCTACTCAGA GAAATTCATT 
7201 CTTCAAGGCC CCCHCCAGC CTTGCTCTTA CCCAGCTGGG CTAC AGTTAC 
7251 AATAAAGGAA ATGACTTTTC TTCTCCCCTT CCCCCAGTAC CTTTGTTTTC 
7301 CTAGTCACAG GGTGGGGCTG GATATTGAAT GGAGAAATTG CTGGGGTCCA 
7351 TCCTAAACTC CTCCCCTCAT CTCTCCCTTA CATTACCCCA TTCTTCTGTC 
7401 TGCAGCCACA TCCATAATCC TGCCTCTGTT AGCCTTCCGA CAGACCCTCA 
7451 GGTGCCCAGG ACAACAGGAA GCTACTTAAA GCTGGAACCT CAGACTGTGC 
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7 , m aatrgagGCC AGTGACAAAA CTGAAAGTAG CTCTGTCAGT AATTGTGCTG 

5wl ctKtag Stggcc agaatctttt ggatctcctg gacatatggc 

7fiol TGACTAGTCC TCCCAAGCCT TCCCAACAGG CCTCTTTTTT TTCCmTTT 
7fiM TCTTTTCTTT 7TTTTCTTTC TTTCTTTCTT TCTTTITTTT TTTTTTTTAG 
■wSSraSfi T&AAATTGTG GGAGTGGAAA AGGAACAAAG AAATCGGTAA 
7751 CTGGTAGTGA TCAATTAC7T GTAAACACTA TTGTACTTGG ACCAGCCCAG 
TAPfTCTTTT TTAAAACTCT GAG7TACCTC TCTTTCCTTT CC7TGAGCAG 
7851 TGCCATTAAT TcScTG GGGCAATCCT TTCTGATGTT CTCTGGACCT 
*5l GttTCTCTCT CC1TAGGAGA GGCCAGGAGA GTAGCCAGAG AGCATGTCAT 

w£ Sgctga GGTTAAAGTG TGGAGCTATC AATGGTGACC tggcctcttg 

finm fTATfTTAGC AAGCCAGAGG ACCTTGACAA CTTTTTTGAT GA7TGTCCGT 

S2S tmccctcat caaaggtgtt TGGCTTAGGA ggagggaaga aaagctaccc 
810 ctSSgtct tSccc agcgtgggtc tctattgctt gacctggttc 
55 ctagcagcat tatcagaagg aaaatccacc gctcttaagg ctcctgggaa 

8201 CmCAGGAC TTCCTTTCTC AGGATTGCAA ACATMGACT ATTT&AGCTT 

Si tcacttttga AAAGCGGTTA ctaataccta TACTCTGGGA aagggctaat 
Hi SagaSgaa gactctggtc actgcatcag gcaacagacc atttccgcta 
M51 mtttagtga ctccaggaag gccagtgaag aaataacaca cgtagcaacc 

Sni ATfiPArTGTG TTGTAATATG TTGGCTGACA GCAGGGTACT 7TCTGTGATG 

ocni crTATAATfA TAATTACAGT aataggtacc acttattgag TACTCTGTGC 
£5 SSSSct cct&agcata cgacatgcat agcacattta atccttacaa 

mm T^CTTAATA AMTGTAGTA CTAGTCHAC CTACTTC6AG AATAGGGAAA 
S JSttaC TTGTTTAAAG TCACAGAGCT AATAGGTAGC ATAGCTGAGA 
™ ESKcTCA ScATTCTTA CTCCTTGCCT GCMGAGTCT CTTGGCAnC 

S2J IKSEKa gcatatttct TAACCTCACT GAGGCTCAGT ttcctcttat 

25 rARATAACAT TGAAGGGTGT TAGTTTAAAG GCTTCATGGA CTCTATAATG 

ss sssss ss she? ssssss 

9051 GTCTTWCrc AmGTCCAG TTTATCTTTT AGGAAACAGC CAGCCCGTAG . 
S5 ATCATTAAGG CTGGCTATTG GACAGGGGGC TGGGGCCTGC CTGACAGAGG 
55 S^liffifiC AGACATCTGG TTCTTCCTCT GCCCCTACAA GAGACTCCAG 

£K SSmmcr Staggatgt agcagcagca tatoagcttg 

s rs SKSSS BS 3SSS& ssss 

SS SI ACTGATAnA = TT GTATTMGAA 

gSSS sssss s s 

ncci rrrrArrTPA GTfTTGCAAG CAGCTTG6AC TACAGGCGTG CCALLALAU, 
%01 TTOCOTTT TTmATTTT MGTAGAAAC AAGGTCTTAT TAATACTATG 
3] V^rrlrxr TGGTCTTGAA CTCCAGCGAT CCTCCTGCCC CAGCCTCCCA 
9701 MGTGCTTGG ™S CTMGCCACT GTGCCTGGCC AGTGCAACCC 
S5 JStttMA CTAAAACAGG AAGGCCCAGA AAGGTTTGGA GTAACTTGTC 
«ni caStcaca CAOATGATAT TTGAACTCAG GTCTCCCTGG CTCCCAAGAG 
25 2Xttt CCACTAGGAC TCCCAGGAGA AAAAAAAAAA AAAAAACAGT 
9901 SSJ kwmtc tgatttgagt CTTAGTTGAG CTAG&CTAAC 

9951 TGTGTAACTG TGGGCAAGTT CCTTAGCCCC TGTGAGCCTC AGTTTCTTAT 
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10001 CTGTAAAATG TCATAAAAGA AATCCATCTC ATG6AGTA6T TGTGATGATC 
10051 MGGACTCTG AAAACATTAG AATGGTTTAA TGTGAAGGAT TAGCAGCAGC 
10101 ACATGGCAAC ATTGTGCATC TTATATTAAC TATCCAAATA TATCAAGCGT 
10151 CATTTCCTAT ATATAAAAGT CATCAAATTA GGCACTGTGG GGGATACGGA 
10201 GnGGCATAC TAGCCTGGCC TCTTAATTAA TTCATTAATT AGCTTATJTA 
ISil TTTTTCAGAT AGGTCTTGCT CTATTGCCCA GGCTGGAGTG CAGTGGCATG 
Si ATGATAGCTT ACTATAGCCT CAATCTCCCA GGCTTAAACA ATCCTCCTGA 

lo35i gtaSt&sga ctacaggcac acactaccat gcccagctaa iiiiiiiiia 

10401 ATTTnTGTA GAGACAGGGT CTTGCTCTGT TGCCCAGGCT GGTCTCAAAC 
10451 TCCTGGGCTC GAGATCCTCC CACCTGGGCC TCACAAAGTG TTGGGATTAC 
sol Sue CACGGCACCT WCJBSTCT CTTAACTQGT TCCCTAAGAC 
10551 AGCTGGAAAT AGAGAATGTC ATGGAGCATT CCTAACCATG GGCTCCAGCC 
10601 TGGCTTTCAT TCTGTTTCTC CCCTGAAACA ACATTCCTTT AGTAATATTC 
10651 CGAATMCAG CTTCATCAGT CTGTCTACCG ACCACTCTTC AGGCTTCATC 
10701 TWATGACC TCCCAAACTG CACTAAGGGT TGTATTAGAG AAAAGTGGAT 
U751 MAGTTCGGA GTCAGGCTGC TTGAGCTTAA ATGCCAGCTT CACTTACCAG 
10801 CCACCTGACC ATGAGTCAGC TGCTTAACCA TTCTTTGCCA CAGTTTCCTT 
10851 GTCTATGAAA AGGGAAATGG CTCCCACCTC AAAAAGTTGT TAACATTAAA 
10901 TTCAATCATG TATTCAAAGT CCTGAGCAGA ATGTCTGGCC ATGACTGGGA 
10951 CTTAACAGAT GTTAGCATTT ATTATTAGTA TCTGTCAGTC TTGAAATGTT 
11001 CTCTTCCCTT GGCTTTCATG ACATTCCACA CTCTCCTGGT TTTCTCTTAC 
U051 CTCTCTGGTA ATACCTGTTT GCTTATCCTT CTTTGTCCAG CTCTGGGATG 

Will TTACCATTCC TTCAGGCGTG CTGTTTTCTC CTTAGGCAGT CTTACACACA 

1 51 CTCATCACTT CCTTCCATTG TCCTCCACAC ACTGATGACC CTTWATCAG 

11201 TATCTCCAGC CTAAACCTTT CCACTGAGTT CTAGACCCAT ATGTTGTACT 
11251 ATCAACCTGfi CTT^TCCATT T6AATGTCTT CCAGGCACTT CAGACTCTCT 
11301 TCTCTAGACT TTGCTGGACT TTCACTCTTC CCCCTAAAAC TGGCTCCTCT 
11351 TCCACTGAAA CATGTATGTC ATTGAGAGGC ACCACCATCC ACCCAGTGCC 
11401 TAAGCCAGAA ACCTAGGAAT CCTTGATACC TGTTCTCTCT CATCCTGCAT 
11451 ATCCAAGCCT ATCAGTTTTA TCTCTAAATT ATATTTTGGT AGGTTTACTT 
11501 CTTTCCHTT CTCCCACCAC CACCCTGCTC CAAGCTACCA TCATCTCACC 
11551 TGGATGTCTG CAATAGCCTC ATCTCCCACA GCCACTCTGC ACCCCCTMT 
11601 CTGTTCTCTA TAGAGCAGTT GGAAGGAGTG ATTTTTGTTG TTTGTTTTGT 
11651 TTTCTTTTAG ACAGAGTCTC ACTCTGTTCC CCAAGGCTGG AGTGCAGTGG 
11701 CACAATTTCG GCTCACTGCA ACTTCTGCCT CCCGGGTTTA AGCAATTCTC 
11751 CTGCCTCAGC CTCCCAAGTA GCTGGGATTA AGGCACCGGC CCCCATACCC 
U801 AGCTAATTTT TATATTnTA GTAGAGATGG GGTTTTGCCA TGTTGGCCAA 
H851 GCTAGTCTCG AACTCCTGAC CTCAAGTGAT CCACCTGCCT CGGCCTCCCA 
11901 MGTGCTGGG ATTACAGGTG TGAGCCACTG CACCTGGCTG GMGGAGTGA 
11951 TCTTAAAAAA AAAAAAAACA AAAAAAAACT TGACTGTGTC ACTCTGTGTT 
12001 ScTCTCCTA CCTSAC TTCCACAACT TCCCAGTG£ CTTgATAAA 
12051 6ACCAAAATC CTTAACTTGG CCAGGCGCGG TGGCTCACAC CTATCATCTC 
moi aSctttgg GAGGCCGAGG CAGGCAGATC ATGAAGTCAA GAGATTGAGA 
12151 CCATCCTGGC CMCATGGTG AAACCCCATC TCTACTAAAA ATACAAAAAT 
12201 TAGCTGGTCG TGGTGGCGTG TGCCTGTAGT CCCAGCTACT TGGGAGGCTG 
12251 AGGCAGGAGA ATCACTTGAA CCTGGGAGGC AGAGGTTGCA GTGAGCCCAG 
llloi S TGCACTCCAG CCTGGTGACA ^GTMGACT CCATCTCAAA 
12151 AAAAAAAAAA AAAAAAAAAA TTCCTTAATT TGGCCJACAG TAGAGCCCTC 
12401 CGTAATCTGG CCTCTCTCCA CATCTCCACA ACCTCCTGCT CCCTGCACTT 
12451 CAGCCtS^ TCTCTfcTGG ACAGGCCCTC CTTCTGACAA GGGCTTTGTT 
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12501 CATTCTGCTC cctctgccta gaatgccccc ttactctgtt cacttaactc 

12551 CTGCTTATCG TTTAGATCTT TACCTGGATG GCTCAGAGAA ATATAGAAGT 
12601 AATTCCTCAC CCTGAAAAAT AGGTTAGGTC CCTGTTTTAT GTTTTCATAG 
12651 ACCTTTCCTT TGAGGCTTTT TTTAAAAAAG TAGTTTTAAT CTCACAT7TA 
12701 TTCATGTGAT CATCTCCTTA ATGATATCTT AAGACCTCTA ATAGAACAAT 
12751 TTGGTCATGG ACTGTGGGGT TTTTGCCCCT CATTGTGTCA GCACTGAGCA 
12801 TATTGTTGGC ATAGGAGGGA TATTTGTTGA ATGAATTGCT AGAGGTGGCC 
12851 AAGAGATATG ATGTAAGTCA GGCTTTTCCC TGCCCTTCCC CTTCCCCTTC 
12901 CCCACATCCT TCCTATAGCA GCCACCGTGG CTGCAGTTAC TGTAAATGGC 
12951 AAGACGGAAT CAGTTCCGGA CATTGGGTTG T7TTAGAAAA 7TGCCTGCAA 
13001 GTGTCAGGGT GATAAGTTAA AGCTTTGTCT TTTGCCCTCA GAGGAGCTAT 
13051 CCCATAGTGA GTAGAAGCCA GAGAAGCTGA CCCCAGGAGT CCTTCTTTCC 
13101 AGCAGCAGGT CTTGAGCTGC ACTTCTCTGT AGCTACAATC CAGGCAGGAA 
13151 CAAGCCCTAG GTACCTCCGG AGAGGAGGGC AAGAGAGGAA GAATGAGTTC 
13201 AGCTACTCTA GCCACCAAAC TGATTATGAA TTGCCCTGAA ATCTGAAAAA 
13251 TTTCAATTCC AATCGTAAGT TTGTTTTGTT TCATTTTGTT TTCTTAAATT 
13301 GTATATTTGA AAGATGGCAT TAACTAAAGA TATATATTCA ATATAGAGTG 
13351 GAAAAAATGG AATACTTGCA TAGTATCTTT TACTTATAGG TGATTTATGA 
13401 TGGGGAGTGG GGTGGATAGG TTGGCAGTTC CCCCAAGAAG TTGGAAATGA 
13451 AGTTTGTCCT CTGTGAGTTG AACTAATTAG ATCCACAAGT AATGAAAGCA 
13501 GTATTGTGTT GTAGTTAAGA GCACACTCTA GAACCAGATT GCTTAGTTTC 
13551 AAATCCTGGT TCTGCCTTTT ATTATCTGTG TAC7TTGGGC AAGTTACTTG 
13601 CCCTTTGTGT GCTTCATTTT TCTCATCTAG AAAATGGAGA GGCCAGGCGT 
13651 AGTGGCTCAT GCCTATAATC CCAGCACTTT GGGAGGCCGA GGCGGGCAGA 
13701 TCACCTGAGG TGAGAAGTTC AAGACCAGCC TGGCCAACAT GGTGAAACCC 
13751 TGTCTCTACA AAAATACAAA AATTAGCCAG GCATGATGGC GGGTGCCTGT 
13801 AATCCCAGCT ACCCAGGAGC CTGAGGCGGG AGAAACACTT GAACCTGGAA 
13851 GGCAGAGGTT GTAGTGAGCC AGGATTGCAC CACTGCACTC CAGCCTGGGT 
13901 GACAAGAGCT AGACTGAGTC TAAAAAAAAA AAAAAAAAAC AAACTGGAGA 
13951 TACAGGCTGG GTGCAGGGCT TACACTTATA ATATCAGCAC TTTGGGAGGC 
14001 CTAGGCGGGA GGATTGCTTG AACTCAGGAG TTTCAAGATC AGTCTGGGTA 
14051 ACAGAGCAAG ACCTCATCCC CACAAAAAAT CAAAAATTTA GCCAGGCATG 
14101 GTGGCTCATG CCTGTGGTCC CAGCTACTCA GGAGGCTGAG GCGAGAGGAT 
14151 TGCTTGAGCC CAGGAGGTTG AGGCTGCAGT GAACCATGAC TGCACCACTA 
14201 CATGCCAGCC TGGATGACAG AGCAAGACCC TATCTCAAAA AAAAAAAAAA 
14251 AAAGAAACGA GCCAGGCGCG TTTGCTCACG CCAGTAATCC CAGCACTTTG 
14301 GGAGGCCAAG GCAGGTGGAT CACTTGAGGT CAGGAGATCG AGACTAGCCT 
14351 GGCCAACATG GTGAAACCCC ATCTCAACTG AAAATACAAA AATTAGCCAG 
14401 GCATGGTGGC ATGCTCCTGT AGTCCCAGCT ACTCACTTGG AGGCTGAGGC 
14451 AC6AGAATCG CTTGAACCCA GGAGGCGGAG GTTGCAGTGG GCCAACATCA 
14501 TGTCACTGCA CTCCAGCCTG GGAGACAGAG CGAGACTCtG TCTCAATAAA 
14551 TAAATAAACA TAAAATAAAA TAAAATAAAA TAAAATAAAA TAAAAAAATA 
14601 TGGAGGCCAG CAGGCACGGT GGCTCACGCA TGTAATCCCA GCACTTTGGG 
14651 AGGCCGAGGG GGGCGGATCA CAAGGTCAGG AGATCGAGAC CATCCTGGCT 
14701 AACACAGTGA AACCGCGTCT CTACTAAAAA TACACAAAAT TAGCCAGGCA 
14751- TGGTGGCAGG CACCTGTAGT CCCTGCTACT CAGGAGGCTG AGGCAGGAGA 
14801 ATGGCGTGAA CCCGGGAGGC GGAGCTTGCA GTGAGCTGAG ATCGCGCCAC 
14851 TGCAGTCCAG CCTGGGCGAC AGAGCAAGAC TCTGTCTCAA AAAAAAAAAA 
14901 AAAAATGGAG GTTGGGCGCG GTGGCTCGCG CCTGTAATCC CAGCACTTTG 
14951 GGAGGTCGAG GCGGGCGGAT CACCTGAGGT CAGGAGTTCC AGACCAGCCT 
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mnni GGCCAACATG GTGAAACCTT GTCTCTACTA AAATTACAAA AATTAGCCAG 
Ibobi GCACGATGGC AGGCACCTGT AATCCCAGCT ACTTAGGAGA CTAAGGCAGG 
15101 AGAATAGCTT GAACCTGGGA GATGGAGGTT GCAGTGTGCT GAGATCGCGC 
15151 CACTGCCCTC CAGTAGAGTG AGATTCCGTC TCAAAAAAAA AAAAAAAGAA 
i wni rAAATCGAGA TACAAACTTA : CTACCTACCT CCHACAACC TACCCTCACA 
15251 GTATTACTGT GAATAAAAGT GTGTGTAGCA CTGGGAACAC TATTCACAGA 
15301 GCACTCATGA ATGTTTGTTC TTTGTTATTA GTTACTAGAG AGGCAAATGT 
5 53 Saatat GTGTGAATTG GTGATTGTCG CACATATCTA 

15401 AAGAAGTAGT TATTTTTTTC AATTAAAACT TAGTTTAAAA ACCAATATAA 
1*451 GGCCGAGCGC AGTGGCTCAC ACCTGTAATC CCAGCACTTT GGGAGGCCGA 
Hboi GGTGGGCAGA TCATTTGAGG TCAGGAGTTC GAGACTAGCC TGGCCAACAT 
15551 GGTGAAACCC TGTCTCTGCT AAAAAAAAAA AAAAAGTACA AAAATTAGCC 
15601 AGGCATGATG GCAGGTCCCT GTAATCCCAG CTACTTGGGA GGCCGAGGCA 
««i RTARAATTGC TTGAACCCAG GAGGTGGAGG TTGTAGTGAG CCGAGTTTGT 
§701 gccactSac TTCAGCCTGG GTGACAGAGG GAGACACTGT CTCAAAAAAA 
• ^5751 AAAAAAAAAA ACCAAAACCA ATATAATAAA TAAGTGGCCA GCAATGAAAC 
l55 AGAAAGTGAA MGTTAGTGA AGCAAAACTA GTACTGTATT CAGATAAAGA 
JllSi ^SSatct agatttggtc ACCAGAATAG GGTCCTTTGT GGCAACCTGG 
15901 GCTAGTTTGG CTCACTCACC ACTGCCAGGA TGAAATTTCT TTCAGTGGCT 
S ACTCATTTCC CTTTATTTTA AGTCCATGCT CACAGAGCAA CCTTCTGATG 
16001 CCTMTTCAG CTTCCTGGGA TACTTAATAA CAGGAAGGGT CTGGAAGTAG 
16051 TACCTGTATA GGGGATATGA GTGTTCTGAT TTTAATAGTC AATTCATAAG 
16101 TGTACAGAGG GTTTGATAAA TGGTTAGGTC AGAACCATCA CAGAATGTCT 
16151 ACACCTCTTT GGACATTAGG AAGGTCAAAA ACCTGAAAGG CCAAAAGCTA 

ifiwr rTGGGTGGTC CACCAGTCAA CCTTCCTTTG ATCAC ACCTC CTTCGTCGTT 
16301 GCTTCTmA GCATTGACCT GTAATGGGTA TGGAATTTTT TGCTCACaA 

SanSr vmaem aagaagttga agcccagaga gatttaatgg 

ifiAM fTTGCCTAAG ATCACACGCA GATTTTCTGT TAACCAGGGT GAl III iCAG 
?6451 GTGTTCCCTG CCAGACGAGG GCTTTTTTCC TTGAATTGCC TAGAGATTTC 
?6501 TTGAGATATC CGAAGCATTT TTCCCAGTGC AGCCTGGAGA AGGATGTCCC 
16551 TGTCAACACA GCATTTGTTA CTCAATGTTA GACATTCAAT TTTCTAATTA 
16601 GTATCATGGA GCAACAGTGG ATGATTATCT ATAAGGGGTT GCAATTCCAT 
JsEi rmATGTGC TTACAGCCCA TATAGACAAA TATCAGCTGT TAAAATGACA 
iSoi aSagtaga Stgtggccc CAGGACAAAG GCATACTCTG CTGTTAGTGA 
16751 aXttc gccagcaaat ttcacatggg CATATACACG GCCAACTGTA 
80 ScttKS aWaccc ahcagagag ccaaactggc aactaaagat 

tfiflci part ATTCTC TTTGGCATTT CAGCTTTGCG TTCTGTTAM AATCACTGCT 
16901 TGOTAMTA CCTCTGATAG CTCTTCACTG CCTGTAGGCA ACTCTTTAGC 
1 §S SSagact TGGTCTTTAG TGCTCTGCCC CTACTCTCTT ccaccattct 

i7fTCi AffTGCTCAG CGTTATATGA GCATACCATA CTCTTTATGC CTCAGTGCAT 
17101 TTGCACATGT TGTTCCTTCA GGCCAGAATG CCTGTTACTG CCTGGCAATC 
17151 AGCCTATTAG AGTCTGCCAA TACCATCCCA TCTTCTGTGG AGGAGCCCCC 
\m\ cgcSaatcc AGCCATACCT ctccccacca ATCAGAGACT tcttctctct 
17251 TTGTTATTCT CTTCGTTATT CTCTTCATAC CTCAGTTATA TCCATTTCAG 

}™T SfTTGnTA CACATCTAGC ATCACTCTTA GAGTGTGAAA TTCTCCAAGT 

73 gtogccgt at tagtttg tctttgtatc ccagagctta gcaaagtgcc 

i-rYni taStttag TGGGTGCTCA GAGTGTTTGC TGGGTGAATG ATGTATTTGT 
17451 tS?C TnS WW CCA1CCAGTA TTOTTA 
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17501 CCATCTCTTC GCTCTACAAT ATTCTTTTAG GCAAGAGCTT ATCTTTTGAG 
17551 GTGATAAGAT AAGCTCAAAC TTATGTAGAC TAAGACCTCA GTCTGTAAAT 
17601 GTCATCCCTA AGTCTTAAAC CATCAAAACC AGGGCCTCAA GGA ATGGCA T 
17651 GCCTTCTGCA ACTGTAGCAA CCTGCTGTGC TTATTTTGCC GTGTTTTTCA 

17701 rmrccccc aaaagctaga gtcccttctc ccatgggcag tgctggaagt 

17751 GTGCTAACAA ATTCTTTCTC CATACTGCTT ACGATTACAA AAAAAACCCT 
17801 CAGCATCTCA TGCCAGACTT GAGTTAAGGT tgttttcttt TGTGTGTCAG 
17851 CTGTATTCTG GTCATGACTT CCTGATGATG CCCTATAGAG ATTTTGCTGA 
17901 GATCAGAGGG TGCTCCACTG CCATCAGTAG CACTGACTCT TGCAGAAGCA 
17951 CCG7TTCTGA AGTTGGCTAA TGTCATCCCT CACGTTTGTT TGTTTGAAAT 
18001 TTGTTTTAGT TCCAGAGATA GCACTTTCAT GGAATGACGC TATCTTCTAG 
18051 AATCACTTTT HIIIHI I I TGAGTTGGAG TCTCGCTGTG TCGCCAGGCT 
18101 GGAGTGCAGT GGCACAATCT CAGCTCACTG CAATCTCCAC CTTCCGGGTT 
18151 CAAGTGATTC CCCTGCCTCA GCCTCCCGAG GAGCTGTTAC TACAGGCGCA 
18201 CACCCCCACT CCTGGCTAAT TTTATGTGTT TTAGTAGAGA CGGGGTTTCA 
18251 CCGTGTTGGC CAGGATGGTC TCGATCTCCT GACTTTGTGA TCTGCCTGCT 
18301 TCAGCCTCCC AAAGTGCTGG GATTACAGGT GTGAGTCACC GCGCCTGGCC 
18351 TAGAATCACC TTTTTATACC ATAACGTGAG CACCACTGCC GCGTCACCAA 
18401 GGAAAGAGAG AGGCAGCTAC TGTGGGGTTA CAAATGGGTA AGAGTGGCAC 
18451 CAGGAAGGTG AAAGTCTCTA CTTAGCCAAG GCTTAACAAA ATGTCAATCA 
18501 CCAAACATTT ATTTATTAAG CTAC6TTCAG GATAAGAAGA TGAACAAGCT 

18551 ATCTGTACAT TCATTTTCTC GTTTGTAACA AGGTMTGAT AGTGATCTAT 

18601 CCT6CCT6CC TCTGAGG6TT ATTGTGAGAA TAAAATGAAA TCAAGTGGM 
18651 MGCACTTAG GAAAAAGAAA agcattggtt ttcaattgtt agtgtggatc 
18701 AGAAACACTG GGGCTTGTTT AAAATGCAGA TTCTTAGCCC CAGJCTCAGC 
18751 GATTCTGATT ctgtatatct gaagtgggac tcaggaatgt TGATTTTCAA 
18801 CAAGCTGACC AGAGGGTCCA ATGCTGCTAT TCCTTTAGTT ACACTTTCAG 
18851 AAATATTACT GTAAATCAAA TGGCAAGAAT AAAATAGTTA TTTGAGGCAG 
18901 TTTTAGTATG TTGGACCTGG AGTCCAAAGA CTTGGGTCAA ACTCCAGCTT 
18951 TGTCAGTTCC TAGACCTGTG ACCTTAAACA GCAACCTTCT CTGTGAACCT 
19001 TAGTTCCCTC AGGAACGGCT CTGGTCACCT CCTGCTGTAC TCCATTGATG 
19051 ACTCACCACA TAAGGCTCCC TGGGAGTCCC CCAAACCTTT GCTCTCTTAA 
19101 CTCCTTTTAC AGCCTCCTAC ATCTCCTGCA GGTGCTGTCT TCTCCTCCTT 
19151 TTTCCAGGCC CTGCTCTGAC ACAGCATTCA TTCTCCTCTG GGAAGGGTTC 
19201 CTTCAATGTG TCTCCAAGCA CATCACACCC AGGAAGGACC CTGTGGCCAT 
19251 ATCTGTCTAT CACCAGATCA AACTACGTGA AGGCAGGCAC TAGGTACTGT 
19301 CAGTGCCCAG CATAGGCCTG GCCCATACCA GGTGTCCACA GATGCCTAGT 
19351 AAAGAAACCT ATGATTCAGG ACCCCCATGA TGAGCAACTA TAGCACTAGA 
19401 ACAGTGATAA TAACTAATGT TTATMTGCA TCTTCAGTTT ACAGAGGGCT 
19451 TTTGTACTCA TCATCTAGTT TAGTTCCTGC AACAACCTCJ TGAGGAATAT 
19501 AGCACAAGCA GGACAAGGGA AGCCCAGAGA TGTTAAATAA TTTATCCAAG 
19551 TTTATGCTGC TGGGAAGGGC AGCACTGAAA TTAAAAGAAA AGTTTTCTGA 
19601 GCTCAAATCC CATGCCCTTT CCTCAATGTG AGCTCTAGCA AGGTATTCAG 
19651 GAATCCTGCC TCTACAGTTC AGAGCCTCAA ATTGCTGGGT ATGTTGAGTT 
19701 CTTGTATCTG ATTTTTCTAG ATTTCCTGCC CACATTCTTA CTGTCTGGAT 
19751 ATCAGGAAAG AGTTTATCAA ATGCCTGTGG AAATCCAAGA TAAGGTCTCA 
19801 TGATGAGTAA CCCAGTGAAA ACATGAAGTC AAGTCTAACT AGTCACTACT 
19851 ATTTCACTAC TGCTGACTCC TGATGATCAG CTCCTTTTCT AAGTGCTTAC 
19901 TGTCCACTTA TTCCATCATC TGCCTAGAAT TT ATGTGAA G GA ATCAAAG C 
19951 AAAAGGATCA TAAGGCTTCC TTTTTCCAGT ATGTTTTTCC TCCTTTTTGA 
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20001 AAACTGGGCC AGTTAGCTAT CTCCATTTTT ATTTCATGAA TACATCCCCA 

fZ S TATAGTAGAT AT^CATT ACACTTTGGA OATATTGCAC 
20101 fTATTCTCCA GTTTCTCCAA AGTTACTAAC AATGGTTCCA TCACTGTGCC 
a Katttt ™cm tat™ aawttctc ccagtctgaa 

9ri9ni AATPTGAAGA CATTTCATGT GACTTGGTAT CCTCATATGT CTTGGGCTTC 
fmi CAATTCTCCA TTCCTAGTTT CAAGTTCATG AACTGTAAAA CAAAGGATTA 
20301 GACTAAATCT CTAAAGTTCT ATCCAGATGC CAMTTCTTT TCTCTTTCCA 
Si TGATACCTAA GaS&CC AAATATTGTC TTTTACCTGG TGTTTGTGAA 
20401 CATGACATCA WTACAGGA GTAGCAGATA CTAAACTCTC ACTCTCTAAA 
20451 A^CTGACTG AGTTCCATGA GCCAGATACT GAAGTGAGCT TGTTCACATA 
IS TGTTCTCATT TAATGCTCAT AACCCTGTGA AGCTGGGAAT TGCTGGGACA 
20551 TTTTATTTAT T7ATTTATTG AGACGGAGTC TGGCTCTGTC ACCTAGGCTG 
?!fi01 GTGTGCAATG GCATGATCTT GGCTCACCGC AACCTCCGCC TCCCGGGTTC 
20651 AAGCGATTCT CTTGCCTCAG CCTCCGCAGT AGCTGGGATT ACGGGGCACA 
I ml CACCAttACA tccagctaat TTTGTATTTT TAGCAGAGAT GGAGTTTCTC 
9^1 rATGTTGGCC AGGTTGGTCA CGAACACTTG ACCTCAAGTG ATCTGCCTGC 
208ol CTCAGCCTCC CAMGTGCTG GGATTACAGG CATGAGCCAC CATGCCTGCC 
90851 CGGGACCCTT GTTTTAGAAG GATGACTGCT GCTATAATGT AGAAAGTGAT 
2M01 TTGGAAGAGG GGAGGAGTGG GGCACGAAAG ATGGTTAGTA GATGGGGGTG 
20951 GTMTGCTTA CCTTTCAGTA TTTGGAGGCT TCGGAGTCCT CAAAAATTCT 
21001 CTTCCTTOAT TGGAGTCCTC CCAGCCAATA GAGGGCTTCA CACAAACAGT 
5inSi TTCTTGGGTT TTGAATTGTT TGACCAGAGC TTTCTTCCGA CAAAAGGTTG 
91101 GGGTGATTCA TTCACTTACC ACACCTTGCC TGAACATTCA CTTGGGGCTG 
2U51 CGGGTTATGA A&GCTATTGT TCTCCAGCCT GTCACAGACG CTTTGAAGAC 
91201 CTGTGCCTCA GCTGGtfCTA AGGAGTCAGT TTG1TCAGCT CCGTGCCAGG 
21251 TTTCCAACTt ATGAAATGTG CTGGAGATTA ACACCTCJCC TGCCATTTTA 
lm\ TCCCTACW MTTGCCAGT CAAAGGATTC CTGCAGTTGC CTCTGGCAGC 
21351 Stmctgat GAATGTTCTG CCAGCTGCTC TGAGGACCTA GAAGAGCAGT 
luni TTTrTATCCA GGACCAGTTT CCAAGGGTGG GAGGGTGAAA TATATCCTCC 
HIS ISrTGACAT TTCATCTCCC AGTGATGGGT GGCTTGGGCC CTTTGAAGTT 
HSi Stcagg MCCACACAC TTGGGTCTGA GCAGCCAGCA GCTTATCACA 
TCTGGTGATC AATCC7TCAA AGGTTCCTCC TGAAGTCTGA ATTnTGGAG 
l\Wi GTCAAATGGA TTCCACCTGG GAGGGGCTTC TGCTTCAACT CAGGACATGG 

aSi rawrecT Stcctcttc Sggg^gg ca^cat ggcattgaga 

21701 TGTCCTCTCA CTTATTCCCC ACCCACCCAC CAAGTCCTTT GTAAGAGGAG 
2?™ TAGGGGGAGA GGAGAGCGCC TGCAGCCTCC TGCTCACATT CCTAGACACC 

gktckto gSgcc GCTGGAACAG CAGAGCTGTC TGAAATGTCA 
91R51 AGAGGAGTTA TGCTCATAGG CTCCCTGGCC TCAGTCTCTT TGTGGCTTGC 

Am" KaCTG TGTTCATCAC = TCA GAGGGTACAA 
21Q51 TTAAAAGATA ATTTGCTAGT CCCAGACTTA ATTTGGGGCC CCCTTCTTGC 

220m Jtgattgaat tagaggggaa cataatagat ttttggtgag aaatagttgt 

22051 CTGTGTGGCT G^AGAAAGA TTGCTCCCAG CTCTCCAGCT GGGCAGCCCT 
22?S TTCAGTATCC CGTATGTTAT TTCCCCACTT CCAGCCCACC TCACCTCCTC 
»!S IgtStT GTGTGTCCCC TCGGCTAGGA TCCTGACCTC CTGCTCAAGA 
22201 GTT^ACTC MCTTGAGAC CCAAGGAAAA TAGAGAGCCC TCTGCAACCT 
922?} rATAGGGGTG AAAAATCTTG ATGCTGGGAG CTATTTAGAG ACCTAACCAA 
223m GGCC^GACA GAGAGAGTGA CTTGCTAAAG GCCACATAGC TAGCCCACAG 
223?1 SsTTCTOAC AATAGTC7TA ATGATA7TAA TGGCTAACAT TTATCAACCT 
22^0} TTAATGTGTC CCAGACTTTG TGCCAAGGGC TTACATGCAG TGCATTGTCG 

iwl SttoStc SSSgtct ggctctgggc ccaggctgag CTTTGGTATA 
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22501 GCATGGTAGA ACGTTGTCTA TAATGTCTAG TCTGGGTTCA AATCCTGGCT 
22551 TCACTTCTCA CATTTACAGC TGAGTGACCT CAGGCAAGTG ATTTAACCTC 
22601 CCTGTACCTC AGTTGCT7TA TCTGTAAAGA GAAAAATCAC AGCACTGTGG 
22651 MTAGTGGGG GTTAAAATTC ATTCATACAA GTAGTGCTGC AAGCAATGTT 
22701 TAATACAGGG TGAGCACCTG TTCAGTGCTT CCTTCTTCTG GCTGCCTCTG . 
22751 GGGCTAGAGT GTGGTGTCTT CGTGGTATAG ATAGATAGAT ATGGCTGAGC 
22801 TCTGCACAAA CACCAAGAGC TGTTCTTCAC TATTAGAGGT AGTAAACAGA 
22851 GTGGTTGAGC TCTGTGGTTC TAGAACAGAG GC CGGCAAGC TAT GGCCCA T 
22901 TGCCTATTTT AATACGGCCT GTGATTGA7T GATTTTTTTT TTCTTTTTGA 
22951 GACAGAGTTT CACTCTTGTT GCCCAGGCTG GAATGCAATG GCACGAACTC 
23001 AGCTCACCGC AACCTCTGCC TCCTGGGTTC AAGCGATTCT CCTGTCTCAG 
23051 CCTCTCGAGT AGCTGGGATT ACAGGCATGT GCCACCACGC CTGGCTAATT 
23101 TTTGTATTTT TAGTAGAGAC AGGGTTTCTC CATGTTGGTC AGGCTAGTCT 
23151 CGAACTTCCA ACCTCAGGTG ATCTGCCCGC CTCAGCCTTC C AAAGTGCT G 
23201 GGATTACAGG CGTGAGCCAC CATGACTGGC CTGATTGACT GATTTTTTTA 
23251 GTAGAGATAG GGTCTTGGTT TGTTACCCAG GCTGGTCTCA AACTTCTGGC 
23301 TTCAAGCAGT CCTCCCTCCT TGGCCTCTCG A ATGCTG GGA TTATAGGCAT 
23351 GAGCCACTAT GCCTGGCCTA TATGACCTGT GATTTTTAAT GGTTAGGGGA 
23401 AAAAAAGCAA AAGAATGCTT TGTGACATGT GGAAATTACA TGAAACTCAA 
23451 ATATCAGTGT CCCAGCCTGG GCAACAAAGT GAGACCCTGT CTCTACAAAA 
23501 AATAAAAAAA AATAAGCCAG GGCCGGGCGC AGTGGCTCAC ACCTATAATC 
23551 TCAGCACTTT GGGAGGCCGA GGCAAGTGGA TCACCTGAGG TCAGGAGTTC 
23601 AAGACCAGCC TGACCAATAT GGTGAAACCC TGTCTGTACT AAAAACACAA 
23651 AAATTAGCCG AGCATGGTGG CATGCGCCTG TAGTCCCAGC TACTTGGGAG 
23701 GCTGAGACAA GAGAATTGCT TGAACCTGGG AGGCGGAGGT TGCAGTGAGC 
23751 CAAGATCGCG ACACTACACT GCAGCCTGGG CAACAGAGCG AGACTCCGAC 
23801 ACACGCACGC ACGCACACAC ACACACACAC ACACACACAC ACGCTGGGTA 
23851 TGGTGGCCAG CACGTGTGGT CCCAGGATGC ACTG6AGGCT TAGGTAGGAG 
23901 GATCACTTGA GCTTAGGTGG TTGAGACTAC AATGAACCAT GTTTATACCA 
23951 CTGCACTTTA GCCAGGGCAA CAGTGTGAGA CTGAATCTCA AAAGAAAAAA 
24001 AAAAAAAAGA AAAAAATCTT TCCATAAGTA MTATCTGTT GGAACATAGC 
24051 CATGTCCCTT AGTTTATGTT TTATATATGG CTGCTTTTGC CCTATAATGA 
24101 CACAATTGAG TGGCCACGAC AGTCTGTATG GCCTGCAGAG CCTAAGATAT 
24151 TTGCTCTCTG GCCCTTTACA GAAAAAGTGC CTTGACCTGT GCTCTAGAGC 
24201 CATATGTACC AGGTTTGAAA CTCAGCCTCA CAGCTGGGTG TGATGGCACG 
24251 CATCTGTAGT CCCAGCTACT CTGGAGGCTG AGGTGAGAGG ATCACTTGAG 
24301 TCCAGAAGGT CGAGGTCAAG ATTGTAGTGA GCCATGATGG CATCACCGCA 
24351 CTCCAGCCTG AGTGAGAGAG AGAGACCCTG ACTCAAAAAA AAAAAAACAA 
24401 AAAAAAAAAA CACCCTCACC ACTTATCAGC TATTTGTCTT GAGAATAGTG 
24451 ACATAACGCG TCAGAACCTA TTTCCTAATC TGTTAAATGA GGCTGATGAC 
24501 GTTTCCTCCT TTTACTGGCA ATTTAAACAT GATGGATAAT AAATGCTAAG 
24551 CACTTAACAC AGGGCCTAGA AGATATTAAC TGCTCAATAA ATGGTAGCTT 
24601 CTTAACAGTA TTCAAACCCA TGTGCTCTTA TCACATGCAT TGTTGTCCCT 
24651 GTGTCCAGTT GGTGGAATGG GAAAAGGCTC CCTTGTAACC CCATCTACCA 
24701 TCTTTATCAG ACTTTCCTGC CATGGTTCAC AGTAAGAGAT A6AAGCTGCA 
24751 CGGTGACTTC TGGCTCTTTA CAATGGTGAG CGGTGTGTGC CTGGTAAGGG 
24801 AGAGCTGATG TCACTGCCCC AAATCCAGTA GTGAGATCTG AGTGTTCTGG 
24851 TTTCCTCCAG CAGCCTTGCT TTTTCCTTTA CAATCCTGCA GGCAGGGAGA 
24901 CAAGGGCTTT CTACATGGTA GGCTCTGGTT TGGTCATCGT CACAACTGGG 
24951 GGCTGTTCAG GTGGGCTCCC ATTCCAGATA CCTAGGCTTA TCAATCCCTT 
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25001 TTGGCACCCC AGGCCTTTTT CTCCCTCATG CCCCATTTTT CAGTTTGAAA 
25051 AGCATGGTTA TCACAGGACA AGTAGAAGAA GCTCCACTGT CCACTGAGGC 
25101 CAATGGATGG TGTTCTGCAT GTGAACACTC AGTGAATAGT GAGT GAATGA 
25151 GAGTAACCTG GGCTCCATCC TATTTGGAGA GAGCTTTGGA AAAGATTTTT 
25201 CTCCTTAAAG AGCCAGAATG AAGCCTGGTA GTGGGAGAGC TCCAGCTCTA 
25251 GAGTCACATG AGCCTACATT TAAATTCCAG CCCTGCCACT GACTCCCTTT 
25301 TTGACCTTGA GTGAGTTACC TAATCTCTCT GTACCTCACT TTTCTTGTCT 
25351 GTAGAGTGGG AATAATTCCT GTCTCAGAGA AATAAAAGAG TGCATATAGT 
25401 GTTTGCCACA TGGAGACACA TCAGGTGTAG GTTAATACTC TGGGCCTTGT 
25451 TTCCTTATTT GCAACACAGC CCTGCCCTGG AGTGGAAGTG GCACCTCCCA 
25501 TTGGTCAGCT CTTGAGGCTG TCCCCAGGAC AGGCAGAGGG AGGGAATGAA 
25551 TGGGAGCCCT AGTGCCAGGA CAGAACAGAT GGCAGCTCAG AGCTAGGATG 
25601 GCTCTCTGGA CCTGTCTCTC CTACCAGAGG TCCCCCC GTC TGGTGTGGCT 
25651 CTTCCTGGAC CTGGCATCCT CTGCTTTTTT TTTTTTTCCA CCTCCAAGCA 
§70 GAATTACTGT CCTGTAGGCA GCTCCTCTGC TTGAGGACAT CTGGGGCCAG 
25751 ATATGTTCAC ACTCTATCCT GCCTTGCCCT TCCCTGAGCT CAGGATGGAC 
25801 GCTCMTTGG TCCCAGTTAT TGTCTGCAGC GCCTGCCTGC AGCCTCGATC 
25851 CAGCCCAGCT CCACCCCTTG CCTGCAAGGT CTGTTTCCTA ACAGCTGCTC 
25901 CAACCACACA CCTCGGTTCT GCGGGAGCCC CTCCTCTTCC TCCCTCCCTC 
25951 CCTCATTCAG GGGTGGGACT GAAGAAGAAG GCTAACTTGA CAGCAGCGCT 
26001 TCTTTCHAG CTAGTCACCG GCCCCTGCTC AAGAATGCCA GTGTGTGTGT 
26051 AGCCTCCACA GAGAGGTCGT TTTCTCGGAG TCCAGAGGGG CCGCCTGAGC 
26101 TTCTGAGAAC TAGGGAGGAG CCATCCCAGC CATGAGCCCC TGTGGGAATC 

26 m tcctS CAAGTGGCCT ggagtcctca ggctcccgca gctgctccgg 
26201 a^gagaggt gagctcaggg cagcctgcct gcagccagag gtgccgggag 
26251 ccccgggcct gtcatggtgg ccatctacag ccggcctgag gcagtcacag 
mm acggatttgc agctgagcct 6tctatctgg tgtgggaaga agatggggag 

2635 TTACTiScA GTCCCGGCTT ACTTCACCTC CAGAGACCTC TTTCGGTGAG 
26401 TTGGTCTCCG AGTTCCCCTC TCCATCTCTC CTGGCCCCTG GTCCTGAGAG 
1645 gaStc TCCCTAAATC TCCTTCTCAC TTAGTCCTTT ACCATCGGTT 
26501 CTGCCGGGCA GAAGCCAGCG GAGGTTATAC CCAAGGAGAA TCGGCCTTGT 
26551 CATTATGTCC TGGAAGTGGT GAGGGGAGGG AWACCCAG 

26601 AAGGAACTTC TTAGGGAGCT CCAGCTCCCC TTCTATCCCA GACAAACCTG 
26651 AAGGAGCCTC CAAAAGATGC CACTGACCTG CCCATTGTAG ATGTTACTGC 
26701 TTCCGGGGGG AATAGCCCAA ATAGAGTGCT GTTTCCAGCT CTCACATGTC 
26751 TTACCTGCGG GCCATGCTGC CTGCCCAGGA ATTTGTCCCA ACAAGCAGGA 
26801 T^CAGGTT TTGCCAAACT GTGGAAACTG GCAAGTCCTG GGTGTGGGTA 
26851 6CCTCCTACA CAGTAGGCAC CTTATAAACG TTTGTTCTCT TAATGGCAGG 
26901 WCATTTGCC TCTGGCCTTG AAGGGCTTCT GAGCTCCCAG GTGAATGTAG 
26951 TTKTGGGGA AAGACCTGGG CGAGTGCTTC TAAGACTGGA GCAATGGGCT 
2N01 TTAGAGTCTT CCTGAGCTGC TGGGCCAGCC CCCACACCTC CTCAGTCCCT 
27 051 SttSEr ACCTCCACGA Gl^C TCA^G 

27101 atgtggaaac tctacctcta acctggcttt ctttgctcat tgccccactc 
2 Sat agaaactccc cagggggtjt ctggccctct gggtcccttc 
27201 tgaatggagc cattccaggc tagggtgggg tttgttttca ttctttggga 

27251 GCAGCCTGTT GTTCCAAAAA GGCTGCCTCC CCCTCACCAG TGGTCCTGGT 
27301 CGACTTHCC OTCTGOTT CTCTAAGCTA GGTCCAGTGC CCAGATCTTG 
27351 CTGCCGGGAT ACTAGTCAGG TGGCCAGGCC CTGGGCAGAA AAGCAGTGTA 
27401 CCATGTCGTT TTGTGGAATG ACCGGACCCT GGTAGATTGC TGGGAAGTGT 
2TO1 CTG^S GGAAGGGGGA AGGGAACTGG TCCTCAATGC TGACTCTACC 
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27501 MGCGCCCTG CTAGACACTT TATCCTTTAA TCTCTCAACA GCCTAAAGA6 
27551 ATTATATATC CCCATTTTAC AGATGAGGCA ACCAGTTTCA ACAGAGTTAA 
27601 CATATGGAGC CTCACTGG6C AGCTTTTTCT GTCTTCCTGA CTTTCTCTCA 
27651 TCCTTCAGGG GGCTGCAGGT TTGTnTCTT CTCCTAGTGG AGAGGAMTT 
27701 CTCAGGTTTG TTTTCCTCTC CTAGCAGAGA GTAAAAAAAG GGATAGTTTG 
27751 CCTGACTTGT TGAAGGTGTG GCTGAGATTG TTTTCTAAAG AGCCAATGGA 
27801 AATTGATCTT GAGTTTAGGA GAAAGCTTTT ACATGTGGAA TTAAGATGCC 
27851 AAGTGTTGAA GTAGCCACAT TTCAGGTCCT CATTAATTTC TCTTAATCCT 
27901 GGGAAGGCAG CTTAGGAGAA GGGTTGTTCC TTTAGGAGCC AGGAACTATA 
27951 CCCCTTTTAC CCTTGGAGAG GCAGGGAAGC CAGGGAGGAC ACAACTTCTC 
28001 AGGAAGAGGA GAAGCTAGAG CAGATAGTGA ACTCTCAACC TGAACCTTTA 
28051 AGGGCCAGAC CACTAATGCC ACCCAAGTCC ACCTGCCGTT TGTCTTGTTC 
28101 TGTCCCAGGC TTTCTGGAGA ACCTGATCTT CTTGCCCCTA CCCCCAAGCT 
28151 CCGTTTGCCC AGCTAGAGTC TGGGGGGTAC TGACTGACTT TCGTAGACAT 
28201 TCTTCCCTTC CCCAAATAAG AGGCCACATT CCTGAAGTCA CTTCTGAAGA 
28251 GATAGCTGCC ACACAGGGCT CTTTCCCCCC AGGGAGGGAC CACCCAGACC 
28301 CTCTGCTCTC CCAGGTATCC GTTACCACAT CACTACCTGG TCAGAAAGCT 
28351 GTTTCTGCCA TTAGCCCCTC CCTCTTTTAT TATAGGATAT CCTCAAGGGC 
28401 TCCTCTTTGG GCCTCAGTTT CATCCTTGGC AGAAAGTAGA AGCTAGACTT 
28451 CTTGGGCTCC TGAACAGGGT CCTTGCTGGA TTCTGTGAAA CAAATTAAGT 
28501 TCTTGACCCT AGGCCTCTGG GGGAGTACAA AGTCTATGGG AGTTCTGGGG 
28551 CTGTGGTTGC AAGGAAAGTG ACGCAACCAG ATTCCATGGG GACATGATCA 
28501 GGCGTGACAT GTGAGGGAGG AAGAGGGAGC AAGGGMTGA AGAATACAAC 
28651 TTCTGTGTCC CATACACCCC TGCCTGACAG GCCATACATA CTCAGCAGAG 
28701 AATGCACTGT CTTTCCTACC ACACTAGCGT GAGGAGTGAG CTGCAATTAC 
28751 CACTGTGC7T CCAAGTAAGA AAATACCTCA AATTGGAATT TACAAA AGAG 
28801 GTAAATTAGG GAGTGGCTTT TGTCGGACAT CTTTAAAGCA TnTTCTTTT 
28851 TATAGAATTT CACTTAATGT CCAATACTGA TTTAATGAGC TTGGGTTTAC 
28901 ACATTATCTC TTGAAGAAAA CAAATGAACC TTTGTGTTCC AAAGCAATCC 
28951 ATGTTTAAAG GGAAAAAATT ATGCATAACT CTGCCCAGCT TCACAGTAAC 
29001 CTTTGGCAGG TGCCTTAGGT CCTCTGGGAC TCTTTTCCTT ATCTGAAAAA 
29051 TGAAGGACTT GGATCAGGTG AATGGTTCCC AGCTCTGCAA CTTATGTGGC 
29101 TCCTCAGAGG CACACAAGCT CTTTTCCATT ATTTGCCAAA TAATGGAGGC 
29151 CCTGTCTTTA ACTGCAGTAC AACTACACAA AATACTTGAA ACTACAGTCT 
29201 TCCTGGTT7T TGGTTGGAAC TGAATCAGTG CACTCTAGCA ACACTTATTT 
29251 CTTGCTGTTC GTAGGCTTCA TTATGTGTTT GGTTAATTTT TTAAAACAAC 
29301 AATAACATAT TCCATAATAA TTACAGCTTA ATTGGCAGAC TGTTTCAGTC 
29351 TATAGGATCT GCAGGAAGGA GGAGTAATAA AGGGATTTTT GACTGAGCTC 
29401 TTATGGAACA GAGTCTCTCT AGGCCCCTGT CATATCTGCC CTTCTGGGCC 
29451 CTGGGGAAAA GTTGGCATCC CCAGTTGTGG TGCTCTCCAG GTGCCCTCAG 
29501 GCTGTGGTGG AGGGAGCTTC CCATTCTCTC CTTCAGCCCA CTCAATtCAG 
29551 AGGCTAGGGG CTGAAAGAAG CTTCTCTACA ACTGGCTGTT CACTGGGAGG 
29601 TTAAGGGATG ACCATCCAGC CAGGCCHCC TCAGGACATG GGAGGGCTTA 
29651 TGCTTTAACA TGTGTAAATC CACTGCAATA ATGACTGGTT CTTTTACCCC 
29701 ATAAGGTTGA GAATTTACCT GTAAACATTT TTGTCTGAAG AATTTGGATG 
29751 TAAGTGAGGG CTGGGCCTCT ATCTTATCTC ACTTCGCTTC TCTCAGCACA 
29801 GCACCTTGCC TGCTTGTTCT TACACATCCT AGATGCACAG TAACTATTTC 
29851 CTAATTATTA GAAATCTA7T AGAATCAATT GATTTCAGCT GGGCTTGGTG 
29901 GCTCCTTCCT GTAATCCCAG CACTTTGGGA GGCTAAGGCT GGAGGATCAC 
29951 CTGAGTCCAG GAGTrTAAGA CCAGCCTGGG CAACATAGGG AGACCCTGTC 
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30001 TCTACAAAAA ATAAAAAATT AGCCAGGCAT GGTGGTGTGC ACCTGTAGTC 
30051 CCAGCTACTC AGGAGGCTGA GGCAGGAGGA TCTCTTGAGC CTGGGAGGTC 
30101 AGACTACAGT GAGCAATGAT TGTGCCACTG CACTCCAGCC TGGGTGACAG 
30151 AGTAAGACTC TGTCTCTTAA AAAAAAAAAA A AAAAAGT TG ATTTCTATTT 
30201 GGATAGATAA AtAATTCATT TTAGGACCTT TCTTTTTCAC TTACAGAAAT 
30251 CTGTTTCATT CTGGGCTGAG AAGCAGGTCC ATATTGCTAG GCATAGGAGA 
30301 AAAAGGGGTC TGTCTGCATT TGCCCTTGGT GGTCTCAAAT TGGGGAGGGA 
30351 AAGAAATGAA CACTTACTGG CTACCTTCTG TGAGCCAGGC ATCATGCAAG 
30401 ACATCTGTAC ATAATTTAAT TCTCATAACC CCATAAGATA TTATTAGCAA 
30451 TGTACAAGTG AGGAAACTGA GGCTCAGAGT CATGAAGTAA CTGGCCTTGG 
30501 GTGACACAGA TGGTAAATGG CAGAGAAGGA ATATGGATCC AGGTCTT6AA 
30551 AGAGAAAATC TCAACTGATT ATCTTTTTTA AAAAACTCAT ATGTTCTCTG 
30601 CTGACTCAAA AGGTCTCTGT GTGGATCTGG GTTGACCCAC TGAACTGACC . 
30651 ATCAGGGTTC CATGCACTTT GTATCTGCCC AAGCCCTCAG MCCCCTCAG 
30701 TAATGTTTTG GAAGATGAGT TTTGGAGGTT GTCCTTAGGC ATAGCCTCAG 
30751 CGTATGTAGG CCTCTAGGTG ATCTCCCCTA ACCTGAGGAT TTCAGCTCAA 
30801 TTCACTCTGG CTCCTCAGGA CAGTGGGATG ACTGGTTCAG ACCTCAGCTT 
30851 TACCACCTCC CAGCTGGGTA CTCTTCTACC TACAGCCAGG GCAGATTTTG 
30901 ACTTTCACTT GAAACTTCCA AAAATTGAAA GGTAGAAAAA CAGCCTTGGC 
30951 TTTGGGAAGA ACGTATGATG TCCATGGCCT CTAAGCATCT GAGGTGGGAC 
31001 ATGTTCGAGT AGCACCTTAC AGTTCCAAAG TGTGTTCTGG GTTCTTTGTT 
31051 TAAAAGAACA GAGACTGCTG GGGAATTGAA CACTGTGAAG TATATGAAGG 
31101 AGGAGAATTG TGCTATTTAA CATTCAGTAC 7TGGGCTAAA GGAGAAGCAT 
31151 GACGAAGTGT TAACACTCAA AGGGTCTTGA GCTGTCAGGG CTCCAGCTTC 
31201 CTTATTTTCA CAGGTGAGAA TCCTGAGGCT CAGCTGTTGA GATGTGCTGT 
31251 CTCACTCCGG TGACATAGTA CAGTGGATGT GGCTTTGCAG CCAAGCACAC 
31301 ATAGCTTCAC ATTCCAGCTC CATCAATTAT GTATTGGGCA GCTTTGCAGA 
31351 ATGATTTGAC TTTAACTCTG CTTTTCAGTC TTCTGTAAAA CAGGGATAAT 
31401 CCTGCTACCG TAGGGTTGTC AGGATTAGAG ATAATATAAA TAAGGTACCT 
31451 CATATAGGAC CTGGATTATG GCTGGCATTC AATAAATAGT AGCTGTTAAT 
31501 TGATAGCTAA GCTAGAACTC TGAAGTCTAC CATGGCAACT TCTTAAGTGG 
31551 TCTGAGAACC CAGTTGTGTT CTGTGGCAAA ACACAGCTTA GGGATCCATA 
31601 CCCAGCCCTC CTGTCAGCTG TTCACCnCC AGTTCTTCAG AGACATGTGT 
31651 GGCAGTGACT TTGGCCACAT AGCTGGCTGT GCCCTTTAAA GGCATTCCTT 
31701 GACACAGATA TGTGGACTGG TGACGTTGCT CTCCAGCCAG GTGTTCTTCC 
31751 CAGCAGGCTG GCCTGGCTGT CTCCTGCATG CCTGTACTTG TTTGTCTCCC 
31801 TGCTCCCTCT CCTGGGCCTG GCCAGAGCTA CTTGCAGCAA ACAAAAGCAG 
31851 GATATTGGCA ATGGAAAGGA GGGTGTGTTC TGGTGCTCCC ATGCCCTGCG 
31901 GCGCACATAC CATTGCAAGG GCGTAACAGA GCCCAGGCCT GCATTTGGGT 
31951 GCAAATAAGT CTGC ACACAG AAGAAAAGAA GGACCTGGTG ACCAG6AGCC 
32001 ATGGAACCCT TGTGCTGCCC TACCTGGGGT ACTGGTTGTT GCCACTCCTA . 
32051 CCATTTTCAG TTrGGAAATA TTTGTTAAGG CTTTGCTCTT CCAGGTCCTT 
32101 TGCTTGGTGC TGAGTCTACC AAGAGTAAGT GGGATGCTGT TTTTGTCCTC 
32151 AGGGAGCTAA CAGTCTAGTG AAGAAGAAAG ATGGTTGCCC AGGAACTTCT 
32201 AAGTCAGAAG GCAGGAGGCA AGAAGGAAGC CCCTGCTCCT ACTGCCAGCC 
32251 CTCTGTTGGG CACCCCATAG TTCTTCAGAA CCACATTTAA TCCTCACTGC 
32301 AGGCCAGGCA TAGTGGCTCA CACCTGTMT CGCAGCACTT CGGGAGGCCA 
32351 AGGCGGGCAG ATCACTTGAG GTCGGGAGTT CGAGACCAGC CTCACCAACA 
32401 TGGGGAAACC CCGTCTCTAC TAAAAATAGA AAAATTAGCC GGGTGTGGTG 
32451 GCATGCGCCA GTAATCCCAG CTACTCAGGA GGCTGAGGTG GGAAAATCAC 
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32501 TTGAACTCGG GAAGCAGAGG TTGCAGTGAG CCGAGATTGT GCCACTGCAC 
32551 TCCAGCCTGG GCGATAAGAG CAAAATTCCA TCTCAAAAAA AAAAAGAAAA 
32601 AAGAAAAAAT CCTCACTGCT ACCTTGAAAG TAGGTGATGA CATTGCCATT 
32651 TCACAAATGA GAAGTGAAGG GGCTAGCCCA AGATCACTTA GGTGGTAAAT 
32701 GGTGGTGCTA AGATTAGAAC CTCAGATCAT CTAGGGAAAA ACACAGATAT 
32751 GCACAGAGTT AAG6GGACCC AGGGTATTGT TTGTCCTCTT GTTTCACAGG 
32801 TGGGGAAACA ACCCAGAGAG GGAAAGGGGC TTGTCCAAGG CAATTTAGCA 
32851 CCCAAGAACT TGAACCCATA TCTCTCTCCT CCTCATTTAG AGCTCATCCC 
32901 ACATGTATCT TATATTGAGA GGAGTGTGAG CCACATACCA AGAACAGTCT 
32951 TCCCCTCTGC CTCCAACCTC ACTGTGCAGT TTTGAGACAC TTCACAGCCA 
33001 TACTCTTCAT GCCATACCCA GCCCTTAAGA CCCTGAAGTT CCCC1TCCAT 
33051 AAGACAAGTA GGAAAAGCTA TAGGGTAAAA ATAGCCATCA GTGTTTGTTG 
33101 AGCACCCAGG AGGAATTGGG CACTCCAGAA AGATAAAGGG ATTCTCAGGG 
33151 ACTTGCTTCT CTAGACTTCC CTAGCTCAGC TGCTTCAACT CATTCCTGCC 
33201 CCTCTTCTCT ACCTCCCGCA GTGCTCAGAA GTAGTAGAAC TCACTCTGGC 
33251 CTCTCACCTT GCATTGTTGA GTTTTATTTA GACTTTCTCT TCCTCAACTC 
33301 TTCATAAGCT CATGAAAGGT GAAGTAGGGT GCCCTGTGTA TTTATCTTrT 
33351 ATATCTGCAG TGCTTAGCAA GTTATAATAA TGCACTTGCC TGGCAAAAGG 
33401 CTTTCTCTCA TACATTAGCT TATTTCCTCT TCACATTGGC TCTTTGTAGT 
33451 AATAGGATGC TATTAGTTAT TTTCAATGAG AGAAAGCTAC TMGAGAAGT 
33501 TGTCCAGCTA GTGACAGTAA GTGGCTGATA AAGTGAGCTG CCATTACATT 
33551 GTCATCATCT TTA ATAGAAG TTAACACATA CTGAGTTTCT ACTATATTGG 
33601 GTCTTTTTTT 1 I I IIIIIII TTTTTTTTTA GAGACGGAAT CTTGCTCTGT 
33651 TGTCCAGGCT GGAACGCAGT GGTGCAATTT TGGGTCACCA CAACCTCCGC 
33701 TTCCCAGGTT CAAGCGATTC TCCTGCCTCA GCCTCCJGAG TAGCTGGGAC 
33751 TACCAGTGCA CGCCACCACG CCCGGCTAAT TTTTGTATTT TTAGTAGAGA 
33801 CAGGGTTTCA CCATGTTGGC CAGGCTGGTC TTGAACTCCT GACCTTGTGA 
33851 TCTGCCCGCC TCAGCCTCCC AAAGTGCTGG GATTACAGGT GTGAGCCACC 
33901 GCGCCCTGCC TATATTAGGA CTTTTATATA AGCTATCTCT AGCTAGCTAG 
33951 CTAGCTAGCT ATAATGTTTT TTGAGACAGA GTCTGACTCT GTCACCCAGG 
34001 CTGGAGTGCA GTGGCGTGAT CTCGACTCAC TGCAACCTCC ACCTCCTGGG 
34051 TTCCACTGAT TCTCCTGCCT CAGCCTCCCG AGTAGCTGGG ATTATAGGTG 
34101 CATGCCACCA CGCCCAGCTA ATTTTTTGTA TTTTTAGTAG ACCAGGTTTC 
34151 ACCATGTTGG CCAGGCTGGT CTCGAACTCC TGACTTCAAG TGATCCACCC 
34201 GCCTCGGCCT CCCAAAGTGC TGGGATTATA AGCATAAGCC ACTGTGCCCA 
34251 GCTGCTCTCT ATATTTTTAA TACATATTAT TTCCATTAAT TTTCACAGCA 
34301 GTTCATTTTA TAGATGAGGA AACTAGGCCA GAGAAGTAAA ATATCTTGCC 
34351 CAAGATGATG TAACTAGTAA GTGGCAGGAT CAAGATTCAA ACCAAGCAAT 
34401 GTTCAAACCT CTTGGAAGCA AGAATGTGGC CACTGTGGAA GGTGCAAGGC 
34451 CTTGACAACA AGAATAGGGA AAAGAAGGAA GTAGAAGGAA AGAGATGGCA 
34501 TGGGCTCAGC AGGCCAGGGA GCTCTTAGCT GTGtGTGTTG GGAAGCTCAG 
34551 AAGGGAGGAA GAGGTTGTCT GTGCAGGTAA GTCCTGAGAA CACACCAGAC 
34601 TTTTGAGAGG TGGAGCTTCA TAGCCAGGTC ATTAGGGGAG AAGGGAGCTA 
34651 TAGATTT7TT 1 1IIIIIIII 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 IAG AGACGGGGTC 
34701 TTACTATGTT GCCCAGGCTG GTCTTGAACT CCTGGGCTCA AGTGATCCTC 
34751 CCACQCAGC CTCCCAAAGT GCTGGGA7TA GAGGCATCAG CCACCCCGCC 
34801 CAGCGAGCTA TGGATCTAAC ATGTACATCT TACACAGTGC TAATAGAATG 
34851 TTGGGTTTCT TCCCCAATAT TTTATTTTGA AAAAAAATTC AAATATATAG 
34901 AAAAGTTGAA AAATGTAGTT CAAAGAACAC CTACATACCT TTCACATAGA 
34951 TTCATGATTT GTTAATGTTA TGCCACTTTG TATATATCTC TCTCCCTCCT 
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35001 ATCTGTATAC TTTTATTTAT TTATTTTTGC TGAACTATTT CAGAGTAACT 
35051 TAAAGGCATC TTGATTTTAC COTGAACAG TTCAATATGT TTCTGCTAAG ' 
35101 AATTCTCCTA TATAAGTCAG ATATCATTAC ATCTAAGAAA ATTCACGGCA 
35151 ATTTTACAAT ATAATATTAT AGTCCAAATC CATATTTCCT CAGTTGTTCC 
35201 AAAAAATGTT CATGGCTGTT TCCTTTTTTA ATCT AAATTT GAATCCAAGT 
35251 TTGAGGCATT GTATTTGGTT GCTGTGTCTC TAGGGTTTTT AAAATCTGTG 
35301 CCTTTTCTTC TCCCCATGAC TTTTTAGAAG AGTCAAGACC GGTTATTCTT 
35351 ATAGAATAAC CCACATTCTA GATTTGCCTG ATTAGTTTTT TTATACTTAA 
35401 CGTATTTrTG GCAAGAACAT TACATTGGTA ACGCTGTTGG TGATGGGTCA 
35451 GTTTTGAAGA GTGGAGATGA TTAAACTGCT TTTGTTCATT GAAGTATCTG 
35501 TCAAGACCAG AGATCCTTAA CTGGTGCCAT AAATAGGTTT CAGAGAATCC 
35551 TTTATATATA CACCCTGTCC CCCACCTAAA TTATATACAC ATCTTCTTTA 
35601 TATATTCATT TTTCTAGGGG AGGCTTCTTG GCTTTTATCA AATTCTCAGA 
35651 GGGCCCCAAG ACCCAAAGAG GTTATGAAAC ACTAGTCTGT CCACTGAGGC 
35701 AGGCAACACA GAGCTGGTTT CTGGGGCCTT GTTCAGTCTG AACCAGCTTC 
35751 CCTTGGGGAG ATAGCACAAG GCTGTAACTT TGCCCCATCT TGGCTTTGGA 
35801 TCAAAGAGGA CTGTCCATTT TGTTGTCATA CCTAGGAACC AGGGACAGCT 
35851 TATGTGGCCT GGTTCCAGGG ATCCAGGAGA ATTTCAGTTC TTGTCTTGCC 
35901 TTTCAGGTGT TCAGAATGCC AGGATTCCCT CACCAACTGG TACTATGAGA 
35951 AGGATGGGAA GCTCTACTGC CCCAAGGACT ACTGGGGGAA GTTTGGGGAG 
36001 TTCTGTCATG GGTGCTCCCT GCTGATGACA GGGCCTTTTA TGGTGAGTGA 
36051 ATCCCTTCAT ATCTGCCCCT CTTGGTCTTC AGAGTCCATT GACAGTGCTT 
36101 CCAGTTCCCT GTGGCCTGTT MTCTTTTAG TCTTTCCATC AGCCAGGGCA 
36151 TCTCCCTTTA TTTATTCATT CATTCAACTA GCAGGTATCA ATTGAGCACC 
36201 TACTAAGTGA AAGGTAAGAT CCTTCCCTCA AAGACTTAAT AGTTGAACGT 
36251 TGG6AGTGGG AGGAGAGGCA GGCAGAGAGG AGACACAATA TAGTTGGATA 
36301 AGGACCTCCA AGGAGAGTGT TACAGGCTGA GAGGAGGATA TACTTAGGTT 
36351 GTCTTTAGGG AATCAGAAAA GGAGACTCTG GAATAGGCTG GCAGAGAGAG 
36401 GGGCTACCTC CTATACCTGC TCTGGACAAA C GACTTTAA G CATAGTGACA 
36451 GATTTGCCAA CCCTGTATTG GAAGAACTGA TGT7TTTTAG TGGGGATGAT 
36501 TACTTCTGGG GATTTCTTCT CATAACTGAG ACCAAAACAG TTTTGTGCAG 
36551 TCTCAGAAAT GACAGGAGGT ACCAATCTGA CACTTCCTTT GGAAGCTCTA 
36601 GGGCAGAGAG TGAAAGAGTG GATTTTGACG GGGGCCTTGC TTGGAGGTCA 
36651 TTCACCCACC CCTGTCCTCA CTCCAGCAAC AGTGATAACT CACTTCCTTC 
3670 CTCCCTTTGT ACACCCHCT CCCCACCTGC TCACAGGTGG CTGGGGAGTT 
36751 CAAGTACCAC CCAGAGTGCT TTGCCTGTAT GAGCTGCAAG GTGATCATTG 
36801 AGGATGGGGA TGCATATGCA CTGGTGCAGC ATGCCACCCT CTACTGGTAA 
36851 GATAGTGGTC CTTTGTCTAT CCTCTCCCAT ATAAGAGTGG CTGGCGGGGA 
36901 GGGACAGTGG CAGGGTGAGT TGGGCAGAAG GAGTGTTAGG GTAGTCAGAG 
36951 CATTGGATTC 7TACCACAGC AGTGCTCTTA A^AGCTCTT TMCTTCTM 
37001 GCA6AATGAT TTACACATGT CTCTACCCTT TTTCCTrACC AACCTTGAAA 
37051 ATGTCTTCAC TCTGCCCTGC AATCCTCCCA GTGGGAGGCA CTCTTCAAGG 
37101 ACGATCCCAG AACATTAAAG TCAAAGACCC CTTAGAGCTC ACCCTGTCCA 
37151 ACCACCTTGG TTGATAAAAG AAGTCAGCCT GGGGCCCATG GAATAGAATA 
37201 GTACAAGGGC AAGGTTCTCA TTGTGAGTCA AAGGTAGAGT GAAGAGAACC 
37251 CAGACCATCT CACCCCAACC CAGGCCAGTG TTTTTCCAAA TATACCACTT 
37301 GCTGCAGATC TAGCTCAGCA CCCCCAGTCC CAGCCCACCC TGAGAACCCA 
37351 GGCTCCTCAT TCTGAGCAGC CAGCTAGAAT CATGACAAAG AGGGTGGTAG 
37401 TGAGACTATG GGTACTGTTG CTTAAAGCCA CATGGTGCAG TGGTTGCTGG 
37451 GGGGCTTCTG TGTGGGACTC TAGCATCTTA TTCCCCCCTG TGCCCTCTCC 
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37501 CCAGTGGGAA GTGCCACAAT GAGGTGGTGC TGGCACCCAT GTTTGAGAGA 
37551 CTCTCCACAG AGTCTGTTCA GGAGCAGCTG CCCTACTCTG TCACGCTCAT 
37601 CTCCATGCCG GCCACCACTG AAGGCAGGCG GGGCTTCTCC GTGTCCGTGG 
37651 AGAGTGCCTG CTCCAACTAC GCCACCACTG TGCAAGTGAA AGAGTAAGTA 
37701 TTTTGAGAAC CCTTCAGCAG GGGTTCTTGA GCAGAGTCTG TAAATGGGCC 
37751 TCAGAGGGCT TAGACCTCCA AAGTCTCATG CAGAACTCCC TTTATTCTCA 
37801 TCTCATATCT TTCTCCTGGA CCCCACTATG CTGTAACCGT ACCTGGGCCT 
37851 TGGCACTTAC TGTTCTCTCT GCCCAGGCTA CTTCCTACCC GATACTTAAG 
37901 GCAAGAATCA CTCACCTTTC AGGTGTCAGG TTTCAGGTCA TGTTTGCTCT 
37951 TTGAAATCAT CTGGCTTGAT TATGTGTATT AGTTGTTTAT CTTCTATCCC 
38001 CTCCACTAGA ATGTAAATTC CAGAAGAAAC TTGCTGTCTT ATTCAGTGCT 
38051 GCATGCCCAG GGCTTGGAAG AGTACCTGGC ATATAGTAGG AGTTGATTGA 
38101 TTATTATTTT GTCAGTCGAG AGAATGAATG GAGAAAATGT GGTCCATGGC 
38151 CCAAAAGAAG TTAAGACCCT ATCCTAGATT CAGGCCAGAG ACCAGATGGA 
38201 GAAAGAGTCT GTGTCTATCT AATACCAGTA ATGTCGTACC TCTGGCCGCT 
38251 TACCATGTAA ATATTGATTG TGTATCTACC ATGTGTTGGA CACTAGGCTA 
38301 GTGCTTGCAC AGCAGGTGAA AGATACTAGA GTTTGGGAAG TCAGGAGGAG 
38351 CTAAGGTCTG TTCTACAACC TTATTAGATG AAGAGGAGAG GGAATTGTGT 
38401 TCAGGGCAGA GGGAGAAGCA TTTCTCCAAA AGTAGGAGTC TTAATCATGT 
38451 CTGATGTAGG TTGAGTGTGG CCAGAAAAGG GGCTGTTAAG TATAGAGGGC 
38501 CTGGATTATG AAAATCCAGC AGATCCATTG AGAGTTTAAG CAGCAAGGTG 
38551 TTGTGACCAA GTTAACATTT TAGAAGGATC ACTGGTATGG AGGTTGGATT 
38601 GGAGAGGGGA AAGCCTAAAG GTATAGAGAC TAGTTAGGAA GCTATTGTAG 
38651 GCTGGGCATG GTGGTTCATG CCTGTAATCT CAGCACTTTG GGAGGCTGAG 
38701 GTGGGAGGAT TGCTTGAGGC CAGGAGTTGA AGACCAACCT GGCCAACATA 
38751 GCAAGACCCC GTCTCTGTTT TTCTTAATTA AAAGAAAAGT CCAGACGTAG 
38801 ACATAGTGGC TCACGCCTGT AATGCCAGCA CTTTGGGAGG CCAAGGTGGG 
38851 CAGATTGCTT GAGGTCAAGA G7TTGGGATT AGGCCAGGCG CAGTGGCTCA 
38901 CGCCTGTAAT CCCAGCACTT TGGGAGGCCG AGGTGGGCGG ATCACAAGGT 
38951 CAGGAGATCA AGACCATCCT GGCTAACACA ATGAAACCCC GTCTCTACTA 
39001 AAAGTACAAA AATTAGCCGG GCATGGTGGC GGACGCCTGT AGTCCCAGCT 
39051 ACTCGGGAGG CTGAGGCAGG AGAATGGCGT GAACCTAGGA GGCGGAGCTT 
39101 GCTGTGAGCA GAGATCACGC CACTGCACTC CAGCCTGAGC GACAGAGCGA 
39151 GACTCCATCT CAAAAAAAAA AAAGAGTTTG GGATTAGCCT GGCCAACATG 
39201 GCAAAACCCC ATCTCTACAA AAAGTACAAA AAAATTAGCT GGGTATGGTG 
39251 GTGCGCGCCT GTAATCCCAG TTACTCAGGA GGCTGAGGCA TGAGAATTGC 
39301 TTGAGCCTGG GAGGTGGAGG TTGCAGTGAG CCCAGATCAT GCCACTGCAC 
39351 TCCAGCCTGG ATGACAGAGT AAGATGCCAT CTCAAATAAA AATTAAAAAC 
39401 AAAGTTTAAA AAAAAAATAG AAGCTATTAC CGTGATCCAG GTAAGAGATG 
39451 TGAATAACTA CAATGATGGA AAGAAGGCAG AGTTCTTAGA GATGGGAGTA 
39501 GGAGAGATGA GGGAACTCCA GATTGGGAAG ATGATGTTCA AGTTTCTGGC 
39551 TTAGGCCACA GGGTGAGTGG CAATTCCCTT CACTGAGATG GGGCATCCTG 
39601 GAAAAGGTGT TGCCTTTCTG TGTGGGTATC CTGGGCCCCT TAGGGGCCAC 
39651 TGGTGGCCTG GGACCTGGTA AACCTTCCCT GCACAAGCAG AATTGGTCAA 
39701 GCAGGTTTTT AGGACATCTT TACCCTGCCT CAACTCTTGT CTGGCCCAGG 
39751 GTCAACCGGA TGCACATCAG TCCCAACAAT CGAAACGCCA TCCACCCTGG 
39801 GGACCGCATC CTGGAGATCA ATGGGACCCC CGTCCGCACA CTTCGAGTGG 
39851 AGGAGGTAGA GTGTGTGTCT AATCTGTCTT GTGAGGGTGG GACATGGAAC 
39901 AGATCCTCTG GGAAATCAGG CTGTAGCCTT TACCTTTTCC TACCCCCAGC 
39951 CCATCTCTTT GTCTTAGCAT TGAGCCTGTG ACCACTGGTG ACCTATTTCA 
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40001 GCGTAACAGG TTCCCAGGGT AGCAGGGATG GTTGATGGAC GGGAGAGCTG 
40051 ACAGGATGCC AGGCAGAGGG CACTGTGAGG CCACTGGCAG CTAAAGGCCA 
40101 CCATTAGACA AGTTGAGCAC TGGCCACACT GTGCCTGAGT CATCTGGGTT 
40151 GGCCATGGGT GGCCTGGGAT GGGGCAGCCT GTGGGAGCTT TATACTGCTC 
40201 TTGGCCACAG GTGGAGGATG CAATTAGCCA . GACGAGCCAG ACACTTCAGC 
40251 TGTTGATTGA ACATGACCCC GTGTCCCAAC GCCTGGACCA GCTGCGGCTG 
40301 GAGGCCCGGC TCGCTCCTCA CATGCAGAAT GCCGGACACC CCCACGCCCT 
40351 CAGCACCCTG GACACCAAGG AGAATCTGGA GGGGACACTG AGGAGACGTT 
40401 CCCTAAGGTG CCACCTCCCA CCCTGGCTCT GTTCTGTCCT ATGTCTGTCT 
40451 CTCGGATGAA GCTGAGCTGG CTTTCAGAAG CCTGCAGAGT TAGGAAAGGA 
40501 ACCAGCTGGC CAGGGACAGA CTATGAGGAT TGTGCTGACC CAGCTGCCCC 
40551 TGTGGGGATC ACAGTTTACA GCCAGAGCCT GTGCGGACCC AGCTGTCTGC 
40601 CAGGTTTCCT TAGAAACCTG AGAGTCAGTC TCTGTCCACT GAACTCCTAA 
40651 GCTGGACAGG AGGCAGTGAT GCTAAACCCT GAAGGGCAAC ATGGCCTATG 
40701 GAGAAAGCAT GGAGCTCAGA GCCTGGAGTA CGGGCACAGA TAGGATTGAA 
40751 TAAATTGTGT AGAAAGACTT TGAAAACAAT AAAGCAAAAG ATGAATGAAC 
40801 GTTTTTTTTA GACTTGAGGG ACCAACAACC CCCAAACCCC AGATTCTGCC 
40851 AGGTCCATGG GGAAGGAGAA GTTGCCTTGA GTGGAAGCCC CAAGTAGGGA 
40901 GACTTACAGA AAAGAAGTCA AGAGCACTGG CTCCCAGGCA 6AAATACTGA 
40951 TACCCTACTG GGGCTTCAGG CTGAGCTCCT CCCTTCACAA ATCACTTCAT 
41001 CTCTCTGAGC CTGTTTCTGC ATCTGTGACA TAAGATGGTA AGATAAAGGT 
41051 GGCTGTCTCA CCAATTATGT AAGGATTAAA TGTGGAAAAG GACATAAAGT 
41101 TGTATAGTGC TGCCATAGGG ACAGTGTTCA GTAAACGTGA CACATTCTTA 
41151 GTATCACTAA GAATCAGGTT CTTGGCCAGG CACCGTGGCT CATGCCTGTA 
41201 ATCCCAACAC TCrGGGAGGC CTAGGTCGGA GGATGGCTTG MCAGAGGAG 

TTTGAGACCA GCCTGAGCAA CATAGTGAGA CACTGTCTCT ACAAAAAAAA 

41301 AATMWTA ATAATTGTTT TTAATTAGAT GGGCAGGGCA CTGTGGCTCA 
41351 CACCTGTAAT CCCAGCACTT TGGGAGGCCA AGGCCGGAGG ATTGCTTGAG 
41401 GCCAGGAGTT CAGGAGCAGC CTGGGCCACA TTCCTGTCTC TACAAAGAAT 
41451 AAAAAAGTTA ACTGGGCATG GTGGCACATG CCTGTAATCC CAGCTACTCA 
41501 AGAGGCTGAG GAGGAGGATT GCCTGAGCCC AGGAGTTCAA GACTGCAGTG 
41551 AGCCTTGATC ACACCACTGT ACTACAGCTT GGGCAACAGA GTGAGACCTT 
4 601 GTCTCCAAAA AAAAAAGTTT GTTTTTTTTT ATCCACTCTC CTCACCAAAC 
41651 AAACTGAGTA AGTTAGAGCC CTCTCAGCTG GCATGTG7TG GAAACAGTGC 
41701 CCTCTCATTA AAGTGCTGCC CTCACTCCCA TTGCCTCTTG GCCTTGGTCA 
41751 GTATGATGAA ATTAGTGGGA GGCAGGGCAA CAGAGGGCAG GGAAGAGCTA 
41801 GAAATCCATG GCCTGGAAAA GGGAAGATTT GGGAGTGGCC AGGTATCTGT 
41851 AGAGCCACCA TGCAGAGGAG GGGGGCAGCT AGCCTTGTGT GCTCTGGTGG 
41901 GCATGGTCAG CAGGAGGCAG AGCAAAAGGA CAAGGGJAAG TAAACCTGTA 
4 951 G^CG^CA AGCCMGAGC ^ 

42001 AAGTAAAGCA GGAGCATACC CCAGAGAGAA AGmGCAGG GCTCTTCAeC 
42051 TGCAGTGCTG TGGACTTCAA CCTTCTTGTT CCTTCTTCAG TMCTGAAAA 
42101 TAACAGTCAT TGACCATGAC TATTATCGAC CGCTTTTGAA AATGTAAACA 
42151 TAGTGACTTT ATTGCTGTAA AAATCATACG TGTTTATCAT CTTAAAATTC 
42201 AGGAAACATG GACAGGTACA AAGATGTGCA AAATATCATC CAAAATCCCA 
42251 TTTGCTGGCC AGGCACGGTG GCTCACGCCT GTAATCCCAG CACATTGGGA 
42301 GGCCGAGGCG GGCAAATCAC TTGAGGTCAG GAGTTTGAGA CCAGCCTGGC 
42351 CAACATGGTG AAACCCTATC TCTACTAAAA ATACAATAAT TAGGCTGGGC 
42401 GCAGTGGCTC ACGCCTATAA TCCCAGCACT TTGGGAGGCC GAGGTGGGCG 
12451 AATCACAAGG TCAGGAGTTT GAGACTAGCC TGGCCAATAT GGTGAAACCC 
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42501 CATCTCTACT AAAAATACAA AAATTAGGGC C6GGTGTGGT GGCTCACGCC 

42551 TGTAATCCCA 6CACTTAGGG AGGCCGAGAC AGATGGATCG CGAGATCAGG 
42601 AGTTCGAGAC CAACCTAGCC AACATGGTGA AACCCCATCT CTACTAAAAA 
42651 AATACAAAAA TTATTCGGTT GTGGTGGCAC ACGCCTGTAA TCCCAGCTAC 
42701 TTGGGAGGCT GAGGCAGGAG AATCTCTTGA ACCTGGGAGG CAGAGGTTGC 
42751 AGTGAGTGGA GATCCCGCCG TTGCACTCCA GCCTGGGCGA GAGAGTGAGA 
42801 CTCCATCAAA AAAAAAAAAA AAAAAAAAAA AAATTAGCCG GGCGTGGTGG 
42851 CGTGCACCTA TACTCCCAGC TACTTGGGAG GCTGAGGCAG GAGAATCGCT 
42901 TGAACCTGGA AGGCGGAGGT CGCAGTGAGC CGAGATCGTG CCATTGCACT 
42951 TCAGCCTGGG CGACAGAGCG AGACTCTGTC TCAAAAATAA TAATAATAAC 
43001 AATAACTAGC CGGGCCTGGT GGCACATGCC TGTAGTCCCA GTTACTCAGG 
43051 AGGCGGAGGC ATGAGACTCA GGTGAACTAG GGAGACAGAG GTTGCAGTGA 
43101 GCCAAGATCA CACCACTGCA CTCCAGCCTG GTTGACA GAG CGAGACTCTG 
43151 TCTCAAAAAA AAAAAAATCC CATTTGCTCA TTTTTTGGAT ACTAGTATAA 
43201 CTATCACTCT AAACCAGTTA GTACTTAAAT CAAGCAGATA TGGGAGATGG 
43251 TGAATTACCA TCTACAGTGT TGTCATATAT GTCACATACT GAGCATTATC 
43301 AGCTAGTAGA ATCTAGTTAA TTGTTCTATG TGTGATGTAT GCAGAGTTCC 
43351 CATTTTGAAT GTGTTTTTAC TATGCTTAAA TAAATGACTG ATGTCAGCAA 
43401 CCCCAAAATG ATACATCTGA TGTAAGAGCC CCTGTTCCCC AATAATAACA 
43451 TCTAAACTAT AGACATTGGA ATGAACAGGT GCCGCTMGT TTCCTCCCTC 
43501 CAGGGTTTCT TGGCCGGTCT CTGAGGACTA CACATCCCTA CTCCCGTCTT 
43551 TCCTCATCTT CAGGCGCAGT AACAGTATCT CCAAGTCCCC TGGCCCCAGC 
43601 TCCCCAAAGG AGCCCCTGCT GTTCAGCCGT GACATCAGCC GCTCAGAATC 
43651 CCTTCGTTGT TCCAGCAGCT ATTCACAGCA GATCTTCCGG CCCTGTGACC 
43701 TAATCCATGG GGAGGTCCTG GGGAAGGGCT tCTTTGGGCA GGCTATCAAG 
43751 GTGAGCGCAG GCAACAATTG CTTTGCTCTT CTGCCCCCAG TCCCTCTGTG 
43801 ACTGTCTTTC GGGGATTTCT CATCACTTGG CCCCACCCCA CACCATGCAG 
43851 GATGCCAGGC CTCCTTCCTG GCTTTGGGTG TTGGTGTGAG AGGTATCCTT 
43901 CACCCCCACC CAGGCCACCT MGGTCAATG TTGCTGTTAC AGTGAGCTTG 
43951 TGGACCTGGA GATCCAGGTT GGGTTGAGCT GTGCCTGTGG CCCTCCTGCC 
44001 TCCAGTCAGT GGGTGTTTGT TAGGTGCCTG CAGACCTCAG TACCGGGCAT 
44051 GCTACAAGGA GCACACAGGG GAATGGCTCC TGCCTCCCTG GTGAACAGTC 
44101 TCAGGGACTA ACCTCTCTCT TTCTCTCCTC CTCCTCCTCT TCTGCTGAGA 
44151 ACTGGGAGGG GGGGTCAGGT AAGACGTGTG TCTCAGCTTG GGGGCAGCAG 
44201 GGCTGGAGAG CTCACCCCCG ATCCACCCAG CTCCCTGGTG CATGTCTTTG 
44251 GCACTGACCT TCCTGCCCCC AGACTTCTGT TCACTCAGGA GACTCACTTC 
44301 TATGCCAAAT GACCAGAGCC CCTGCTTGGC TTGGCAGCAT CCCCTCCTGC 
44351 CTTCTTCCCC ACTTCCCTTT TCTGGGTTCT TGCCTGTCCT CTGTGCATGC 
44401 CCAGCTCTCC AGGAAAGAGG GTTTGCTTCC GTGTGAGTCC CATGTTGCTC 
44451 CACGCTGCAT C7TCCACACA TGMCTCTGT CATTCTGACC CGGCTCAGTG 
44501 TGCCCTCCAA GGGATGGGAT GGCCAGCTGC ATAGATTTTC TCAAACAGTT 
44551 CTCCAGAACT TCCTCTGGTC TCAGCACCAT TAACAGTCAC CCTCCCTGTA 
44601 GGTGACACAC AAAGCCACGG GCAAAGTGAT GGTCATGAAA GAGTTAATTC 
44651 GATGTGATGA GGAGACCCAG AAAACTTTTC TGACTGAGGT AAGAAGATGG 
44701 AGGGGGCCCG GGAGGTTGGT GTCACCATTG GAAGAGAGAA GACCTTACAA 
44751 ATAATGGCTT CAAGAGAAAA TACAG7TTGG AATTACTGTC TTAAAGACTA 
44801 AGCAGAAAAG AGCCCTAGAG GAATATCCCA CTCCCTCTAA ATTACAGCGT 
44851 AATTATTTGT TCAATGAACA CTTACTAAAA GCAACACAAA CAGGGTACAA 
44901 GGGATGCAGT AACAAAAGAT ACAGGGTTCA GAAGAGCTCT CAGG7TATGA 
44951 GGATGATGGA CATGAAAACA CTCCAATTTA GTACAACTCA ATGTTATAAT 
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45001 CCTCACCTGA AC6CCCTGCT AAGGGAGCCT GGAGGGGAGC TCCCTGAGCA 
45051 CTCACACTCC TTGGGCATTT ACAGTTTTCA CTACCCCTCC CAAGTTACTT 
45101 CATGGAGTAA CTTAAGTTGG GGACACCTGT GGTCTGGGTA TTGCCCTCCA 
45151 AGCCACTTGG CCACTCCCAC CCCAGTTCTC CCAATGCAGT TCCAAGGGTA 
45201 AGGCCTATGA AGCCATCTCC ATCTATATGG TGGJGGTCTT CCCTCATCCT 
45251 GATCTTAGTG CCCTGTCATA TCACAAGATA GGAGGTAGGA GATACAGGTG 
45301 GTAACACTTG TCAAGCTGAT TCCTTGGAGG GAAGAGGTAA GGAAGACAGT 
45351 GAGAAGTTAA CCACCAGCTT TCCTTGGCTT CCCCCACCCC CAGGTGAAAG 
45401 TGATGCGCAG CCTGGACCAC CCCAATGTGC TCAAGTTCAT TGGTGTGCTG 
45451 TACAAGGATA AGAAGCTGAA CCTGCTGACA GAGTACATTG AGGGGGGCAC 
45501 ACTGAAGGAC TTTCTGCGCA GTATGGTGAG CACACCACCC CATAGTCTCC 
45551 AGGAGCC7TG GTGGGTTGTC AGACACCTAT GCTATCACTA CCCTAGGAGC 
45601 TTAAAGGGCA GAGGGGCCCT GCTTTGCCTC CAAAGGACCA TGCTGGGTGG 
45651 GACTGAGCAT ACATAGGGAG GCTTCACTGG GAGACCACAT TGACCCATGG 
45701 GGCCTGGACC ACGAGTGGGA CAGGGCTCAA CAGCCTCTGA AAATCATTCC 
45751 CCATTCTGCA GGATCCGTTC CCCTGGCAGC AGAAGGTCAG GTTTGCCAAA 
Si GGMTCGCCT CCGGAATGGT GAGTCCCACC AACAAACCTG CCAGCAGGGC 
45851 GAGAGTAGGG AGAGGTGTGA GAATTGTGGG CTTCACTGGA AGGTAGAGAC 
45901 CCCTTCCTAT GCAACTTGTG TGGGCTGGGT CAGCAGCTAT TCATTGAGTT 
45951 TGTCTGTGTC ACTGAAACTG ACCCCAGCCA ACTGTTCTCA GTTCACAGCC 
46001 CTGTTTTCAA AGAATTACAC ATCTCTAAAG GCAAACAGGG CACGGACAAG 
46051 GCAAACTGGA GAGGCAAACT GTAGCCTGAG ATGGCCTGGG CTTGCCATCA 
46101 CAGGTATTCA GGTGCTGAGG GCCCTTAGAC CAACTAGAGC ACCTCACTGC 
46151 CTAGGAAATC AATGAAGGGG AAATGAGTTC TAGCGGAGCC CTGAAGGATC 
4I2OI AGAAT^T MAGH^ 

46251 AGGAGCAAAG ACCTGGGAGG AAAGAGGAGA AAATCATCTA tttcacctgg 
46301 AAACAAATGA TTCCAAGCAT AGAAATAATA ACAGCTGACA AGTACTGAGT 
4I35I GCCCTCTATA TGCTAGGCAC TGGGCTGAGG GATTAACATG CATGTGCATG 
46401 TTTA7TCCTC ATGACAACCT TGGTTTCCAG ATAAGCTGGA CTGGAAAGGG 
46451 ACAGAGCTGG GATCCTGGGC TAATCAGTCT GGTCGCCAAG CCTGAGAC7T 
ttm CCTTCACATG GGGGTCCATG MAATAGTAG TAGTCTGGAA 

46551 CAGTTTGGGG GTACATCAAG GTC GCTGTGT TTTAAGCTAT GGAGTCT G6A 
S CTA?AGGAGA CAAATGTAAA AGAGTnTTT &GTTGACTGG CTHT^T 
46651 TTTTTGTTTG TTTGTTTGTT TGTTTGTTTG TTTGTTTGTT TnTCCTGTT 
46701 TCTGGGGCTT GAATCAGGAA GGAGGTTTTT TTGTTGTTGT TGTTTTGAGA 
46751 AAGGATATTG CTCTGTTGCC CAGACTGGAG TGCAGTGGCA CGATCATGGC 
S S mGACCTCC TGGGCTCAAG CAATCCTOT GCCT7AGCCT 
46851 CCCAAGTAGC TGGACTACAG GTGTGTACCA CCACACCTAA TTTT1TGAAT 

46901 mTTnrcT i iimuii tttttttttt ggtagagaca gghctcact 

46951 TTGTTGCCCA GGCCTGAATC TCAAACTCCT GGGCTCAAGC ATTCCTCCTG 
47001 CCTCGCCCTC CGAAAGTGTT GGGATTACAG ^T^J^^^ TTTTTrrftAA 
47051 CAGGAAAAGA TTTTTAAGCA AGAAAGCTTA AGAGCTGTGG TTTnCCAAA 

47101 atqStctgg gctggcacag tggctcatgc ctgtaatccc agcacttttt 
47151 tgggaggccg aggtgagtgg atcachgag gtcaggagtt tgagaccagc 

47201 CTGGCCAACT GGTGAAACCC CTGTTTCTAC TAAAGAAAAA AATGCAAAAA 
47251 TTAGCTGGGC CTGGTGGTGC ACGCCTGTAG TCCCAGCTAC TCAGGAGGCC 
%m SSScS AATAGCTTGA ACCTGGGAGG CAaAAGTTGC AGTGAGCCAA 
47351 GATCACACCA CTGCATTCCA GCCTGGGTGA CAGAGTGAGA CTTCATCTCA 
47401 AAAAAAAAAA AAAAGAGAGA CTGATATGGT TAGTACATTG GGGTGGAATG 
47451 CGGAGGGTCC AGGGAATGGA GCCCTGCATA GGGGGCTAAT GAAACATTTC 
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47501 AGATTTCTGA ATTAAGGTAG TGGCTGTGGG GACAGGAGCC TGGGAGGCAG 
47551 GGTGGAGTCA 6AATGGAGAG ACTGGTTGGC AATGAGGGAA CAGGAGGAGG 
47601 AGGAGGAGGA GTTACGAGTG GCTTGAGGTG TCACTTACCA GACATTTGGG 
47651 GGATGGGGGA TAGCCGTGAT TGTTGAGCAA CTGGTTTGGG AAGAGCTAGC 
47701 ATTGATCCCT GCTGTTCTGT GCTAGCAGAA CCTATCAGCA TCTTCTGGGC 
47751 AGGAAACTGG CTCCATGAGA CTGGCTTAGG GAGAGGCTGC TAGTCACCTA 
47801 ATCTGCAGAG AAGGGGCAGC TGGAGCTGTG GGACAGAAGA GGCATCCATG 
47851 TAGCTGGTGG GGGTGTCTCA GCTTGTGAAG AGGAGATGGC TTTGAGCAGG 
47901 GCTGACACTG AAAAGGCTGG AAGAAAAAAA CAGACACACA AGAGTCTCAG 
47951 GATCAGGTAG CATAGGAAAG TTGTGGACAG TCTTTGAGGA GCACTCCCTC 
48001 AGGCAGGCAG GCAGGCAGGT CATGAGCTAT AGCGATTCAG GAAGAGCTCC 
48051 CTGGGTGTGT GAGCAGCTCC AGGAGCCTM GGGATGAAAG TAGTATTGCA 
48101 GGGGGCTGGA GAGCAAGGAG TGGCTCCTTC TACATTTGCA AGGGAAGGAG 
48151 AAAG6AAGTT GCTCCTGAGA GTGGTAAGAG TCAGTGGTGG AGGCCTGGAG 
48201 AGGAGACATA ACAAACAAAT TTGTTGACAA ACATTTTGGT AGGAAGGGGG 
48251 AGAGCTTAAA GTTTAGACAG TGGGGAAGGT GGAGTCTTAG AGGAGGTGAA 
48301 TGTCTGAAAG ACAGAGCTAG CTGGAGCAAG AAGTCACTTC TCTGTTGCAG 
48351 GCAGGAAGGA TCCAAAGTGG CTCAAGCCAG AGATTGGGAG AGTGGGGAGG 
48401 AGGGAGCAGC CTGGATCTAA GTAAAATGGG TAGAGGTGGA GGGGGTGCTG 
48451 CAACGGCCAG GGTTTTCTGA AG7TGGGGAC ATTAGGAGAG AGCTGTGAGG 
48501 GCTTTGGCCA GCCACTGTGC TAGTGATTGG TGAACCAAAG GATGGGCAGG 
48551 AGATGGCAGC AGGGAAGCAG AGGAAGTCCA GGCTTCCTGT TGGTATTGGG 
48601 ACAAGGGAGA GGCCATAGGA GGCCCTGGCC CTGTTGTCCA GGTTGGGTTC 
48651 TGAAGCTGGG TGGGCATGGC CTGGTAGGAG AGCATCTATG GCGCCCAATT 
48701 CCAGATTCAG GGTCTAGTTG ATTTGCTGGC CCTGTAGCCT CAGCTCATGG 
48751 TTCTGTTCCA GGCCT ATTTG CACTCTATGT GCATCATCCA CCGGGATCTG 
48801 AACTCGCACA ACTGCCTCAT CAAGTTGGTA TGTCCCACTG CTCTGGGCCT 
48851 GGCCTCCAGG GTCCTATCCT TCCTGGCTTC CTTGTCACAA AGGAGGCTGA 

48901 CTTGTCCCCT CTGGCTAGAG GGCAGAGGTG TTGCCTAGGA GCTCCTATCT 
48951 TTCCCTTCCT GCTTCTTCCA ATGCCCTTCT CTGTCCTCTG GGAGCTCCGA 

49001 GACACACACA GACATAATTT CACCTTCTCT CATTAGCAAC CTTTGAAATA 

49051 ATTTGATTAG AAGGGACTTC AGAAGTTTGT TGACTATATG TAGAAAACCC 

49101 TGTCATTTTA CCTGCTTTTG CCCCATAGTA GTCTTGTAAA ACAGTTCATT 

49151 GCTGACCCCA 7TTTACAGTG GTGGCACCTG MGCCTCAGC CTGAGGCCAC 
49201 CGAGCTAGTA AATTTACAGG GACCAGTTTG AGACCAGCAT TCCTCCCACT 
49251 GCCCCTCAGC TGTGGTGGTT ACAATGTTGT TTGTCTTACT GACTTGCTAT 
49301 CTGGCTTCCT GGGTGTCTAC CGGCTGGCCC TGGCTGTGCC CTCTAGACCC 
49351 ACACCACGCA ATCTTCATTC CTTTCCCACA TGACTGCCCT GTAGCTATTC 
49401 AAAGAGCTTG TCTCCCCCAA GTCTCCCCAT CTACTGGCTC CACCTTGCCT 
49451 TTTTCTGTCT TATCCTGGTT CTAGCCACTG CCTGAAATCA TFTTAGGAAT 
.49501 AAGACAGGAC AGGGAAAAAC AAAAGCAACC CCCTGTCCCA CCTCTGAGTT 
49551 CCACTCTCCA AGTCCCTGAG CCTCACCTCC AGGGCTCCAG TGGCTCTGCC 
49601 ATGAACCCAC TGTGGGCTGG GAGTCTGCTG TGCACAGATA CCAGACCCTC 
49651 AGAAACACAA ATGCCAAGTG TGTCTGTTTT TTTGTTTTGT TTTGTTTTGT 
49701 TTTTTAGATG GAGTCTCATT CTGTTTCCCA GGCTGGAGTG CAGTGGTGCA 
49751 ATCTTGGC7T ACTGCAGCCT CTACCTCGCG GGTTCTAGTG ATTGTTCTGC 
49801 TTCAGCCTCC CAGTAGCTAG GACTACAGGC GTGTGCCACC ACGCCCAGCT 
49851 A AII II II II MINIUM TGTATTTTTA GTAGAGACAG GGTTTTGCCA 
49901 TGTTGGCCAG GCTGGTCTTG AACTCCTGAC CTCAGGTGAT TCACCCGCCT 
49951 TGGCCTCCCA AAGTTCTGGG ATTACAGGTG GAAGCCACCG TGCCTGGCCT 
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50001 GAGTGTGTCT ATTTGATAGA GCTTTCTGCT CTGATTCTCC CTTGCTATAC 
50051 ACCTTTTCTC CCCTTCTCAG TGGCTTCTCT TGCCTATGCT TCCTCCCCAG 
50101 GGCCAGGTTT GAGAACATCC CCATGAAGTC CTGACCTGTC TTTTATCCTA 
50151 CCAGGACAAG ACTGTGGTGG TGGCAGACTT TGGGCTGTCA CGGCTCATAG 
50201 TGGAAGAGAG GAAAAGGGCC CCCATGGAGA AGGCCACCAC CAAGAAACGC 
50251 ACCTTGCGCA AGAACGACCG CAAGAAGCGC TACACGGTGG TGGGAAACCC 
50301 CTACTGGATG GCCCCTGAGA TGCTGAACGG TGAGTCCTGA AGCCCTGGAG 
50351 GGGACACCCG CAGAGGGAGG ACAGATGCTG CCCTTGCATC AGAGCCCTGG 
50401 GAATTCCAGG GGAGGCCTGT GAAGCGTAGG ACCGGATACC CAGAGCTGAG 
50451 GATATTTTTC CCTTGCCAGG TGGGGCCTCA CGATTTAGCT CCTGAGCTCA 
50501 GGGGGCTGGG MCTGATCAG TGTCCCATCA TGGGGGATAA GGTGAGTTCT 
50551 GACTGTGGCA TTTGTGCCTC AGGGATCGCT AAGAGCTCAG GCTATTGTCC 
50601 CAGCTTTAGC CTTCTCTCTC CATGGTGAGA ACTGAAGTGT GGTGCCCTCT 
50651 GGTGGATAAT GCTCAAACCA ACCAGAGATG CTGGTTGGGA TTCTTGAAAT 
50701 CAGGGTTGTG AGGCCTCAGA AATGGTCTGA ATACAATCCA TTTTGGAGTC 
50751 TGAGGCCCAG AGAAGTTCAG TGAATTGCCT AGGAGCATAC AGCTGCCTAA 
50801 TGGCAGAGGC TAGATGAACC CTAGTCTGGT TCTTTTCCAC TTTAACGTGC 
50851 AGTTTCATCC TAGGCAGTGT TATGTTATAA GGGCTCTCCA AGGCAGTTCA 
50901 CCTACGGCTG AGGAAGGACT ATTTTCAGGT GGTGTCTGCG CAGGACAGCC 
50951 TGTGGGGTGT CCCTACAGAA CCTGTTCTAG CCCTAGTTCT TAGCTGTGGC 
51001 TTAGATTGAC CCTAGACCCA GTGCAGAGCA GGTAAGGGAT GTAAACTTAA 
51051 CAGTGTGCTC TCCTGTGTTC CCCAAGGAAA GAGGTATGAT GAGACGGTGG 
51101 ATATCTTCTC CTTTGGGATC GTTCTCTGTG AGGTGAGCTC TGGCACCAAG 
51151 GCCATGCCCG AGGGAGCAGG CCTAGCAGCT CTGCC7TCCC TCGGAACTGG 
51201 GGCATCTCCT CCTAGGGATG ACTAGCTTGA CTAAAATCAA CATGGGTGTA 
51251 GGGTTTTATG GITTATAACG CATCTGCACA TCT7TGC CAC GTTCGTGTTT 
51301 CATTGGTC7T AAGAGAAGGA CTGGCAGGGT TTTTTTGTTT TAGATGGAGC 
51351 CTCACTTCGT TGCCCAGGCT GGAGTGCAGT GGCACAATCT GGGCTCACTG 
51401 CAACCTCTGC CTTCTGGGTT CAAGTGATTC TCCTGCCTCA GCCTCCCAAG 
51451 TAGCTGGGAC TACCGGCACA CACCACCATG CCCGGCTAAT TTTTGTATTT 
51501 TTAGTAGAGA CAGGGTTTCA CCATGTTGGC CAGGCTGGTC TTGAACTCCG 
51551 GACCTCAGGT GATCCGCCTG CCTCAGCCTC TAAAAGTGCT GGAATTAATA 
51601 GGCGTGAGCT ACCTCGCCCG GCCAGGTTTT liliiniii TTTTTAGTTG 
51651 AGGAAACTGA GGCTTGGAAG AGGGCAGTGG CTTGCACATG GTCGATAAGG 
51701 GGCAGATGAG ACTCAGAATT CCAGAAGGAA GGGCAAGAGA CTGTTCATGT 
51751 GGCTGTCTAG CTAGCTCTTG GGCCAAATGT AGCCCnCTC AGTTCCCTTC 
51801 AAGTAGAAGT AGCCACTCTA GGAAGTGTCA GCCCTGTGCC AGGTACCACG 
51851 TGGACAGAGT GAGGAATCTT GGAAAGATTC CTACCTTTAG GAGTTTAGTC 
51901 AGGTGACAGC ATATCTCAGC GACTCAAACA CACACACATT CAAAGCCTTC 
51951 TGTAATTCCT ACAAAGTTGT GAGGGGTAGA GGAGAGGAGA GACAAGGGAT 
52001 GGTTAGGATA ATGAAGGAAT GTTTTGTTTT TGTTTTTGTT TTTGAGATGG 
52051 AGTTTCACTC TGTCACCCAG GCTGGAGTGC AGAGGTGCAA TCTTGGCTCA 
52101 CTGCAGCCTC CGCCTCCCAG GTTCAAGCAA TCCTCCTGCC TCAGCCTCCC 
52151 AAGTAGCTGG GACTACAGGT GTGCGCCACC ACGCCTGGCT AATTTTTGTA 
52201 TTTTCAGTAG AGACAGGGTT TCGCCATATT GGCCAGGCTG GTCTCAAATG 
52251 CCTGACCTCA GGTGATACAC CCGCTTCAGC CTCCCAAAGT GCTGAGATTA 
52301 CAGGCATGAG CTACCGTGCC TGGCCATGAA GGAAGATTTG TTTTAAAAAA 
52351 TTGTTTTCTT TAATATTAAT TGAACACCTC TGTTCAGAGC ACTGGGCTGG 
52401 TGCCAGAGGG TTTCAGACAT GAATCAGATC CAGCACCTCA TAGAGCCTTA 
52451 ATCTGGCACA CACACACAGC CACAAGGAGA CACAGACAAG GCAGGGTAGG 
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52501 ATGAGTGGAA GCTAGGAGCA GATGCTGATT TGGAACACTT GGCTTCTGCA 
52551 GTGAAGCCCC TTCTTAGTCC TCTTCAGTAA CCCAGCTCTC AGTGGATACA 
52601 GGTCTGGATT AGTAAGATTT GGAGAGATGA TTGGGGATTG GGGAGAGCTC 
52651 TCTAACCTAT TTTACCACCT CCTCTTCTGC CATTCTTCCT GTCCACATCC 
52701 CCAGCATCCC TTTCCCTTGC GAAGTATCTG TGGCCTCTGT AGTCCTTTGT 
52751 AAACAGCTGT CTTCTTACCC TACAGATCAT TGGGCAGGTG TATGCAGATC 
52801 CTGACTGCCT TCCCCGAACA CTGGACTTTG GCCTCAACGT GAAGCTTTTC 
52851 TGGGAGAAGT TTGTTCCCAC AGATTGTCCC CCGGCCTTCT TCCCGCTGGC 
52901 CGCCATCTGC TGCAGACTGG AGCCTGAGAG CAGGTTGGTA TCCTGCCTTT 
52951 TTCTCCCAGC TCACAGGGTC CTGGGACGTT TGCCTCTGTC TAAGGCCACC 
53001 CCTGAGCCCT CTGCAAGCAC AGGGGTGAGA GAAGCCTTGA GGTCAAGAAT 
53051 GTGGCTGTCA ACCCCTGAGC CATCTGACAA CACATATGTA CAGGTTGGAG 
53101 AAGAGAGAGG TAAAGACATA GCAGCAAGTA ATCTGGATAG GACACAGAAA 
53151 CACAGCCATT AAAAGAAAGT TTAAAAGAAG GAAATTCACC CAAACCATTT 
53201 GAATACAGTA AGTGTATTCA TCTTTCGATA TTCCCCTGTC CATATCTACA 
53251 CATATACTTT TTTTTATAGT AAATAGTTCT GT ATTTTGC C CTGCATTTCC 
53301 CTTGTGTTTA CTATCCAGTC TTCCTGTTTA TCATmTGT CGACAACATG 
53351 AAATTCTATT GAGAGACTGT CTGAACATAT TGTAATGTAG ATGTTCAGGT 
53401 TTTTCCAGTT TCTCTTTACA ATAGGTATTT AACTACAGTG AGCAGTTTTA 
53451 TGCATTTAGC TAATTTCTCC TTTGAGGAAG TATTTTCAAA AtTACCTTTA 
53501 TTCTTCTCAG GTAATAATTT CATTATTACC AAAGTTACCC TAGGTCTTTT 
53551 CAAGTGTGTG GTTAAAAAAC GAGAATCTGG CTGGGCGCGA TGGCTCACAC 
53601 CTGTAATCCC AGCACTTTGG GAGGCTGAGG CTGGTGGATC ACCTGAGGTC 
53651 TGGAGTTCGA GACCAGCCTG GCCAACATGG TGAAACCCCA TCTCTACTAA 
53701 AAATACAAAA CTTAGCCAGG CATGGTGGCA GGTGCCtGTA ACCCCAGCTA 
53751 CTTGGGAGGC TGAGGCAGGA GAATTGCTTG AACCCAGGGG CGGAGGTTGC 
53801 AGTGAGCCGA TATCACGCCA TTGCACTCCA GCCTCGGCAA CAAGAGTGAA 
53851 ACTCTGTCTC AAAAATGGGG TTCTTTTCCT GCCATCAAAA ATCATGTTTC 
53901 TTTTAAAAAC AAGTTCAAAC ATTACCAAAG TTTATAGCAC AGGAAATACG 
53951 TCTTCTGTAA TCTCCCTTAA CCAATATATC CCTCAACATT CTCCTCACCC 
54001 CCAACTCCAC CCTCCCAGGA TMCCAGTTG GGACATAATC TTTATTTAAA 
54051 AATGGTTTCC GGATAGAGAA AGCGCTTCGG CGGCGGCAGC CCCGGCGGCG 
54101 GCCGCAGGGG ACAAAGGGCG GGCGGATCGG CGGGGAGGGG GCGGGGCGCG 
54151 ACCAGGCCAG GCCCGGGGGC TCCGCATGCT GCAGCTGCCT CTCGGGCGCC 
54201 CCCGCCGCCG CCCTCGCCGC GGAGCCGGCG AGCTAACCTG AGCCAGCCGG 
54251 CGGGCGTCAC GGAGGCGGCG GCACAAGGAG GGGCCCCACG CGCGCACGTG 
54301 GCCCCGGAGG CCGCCGTGGC GGACAGCGGC ACCGCGGGGG GCGCGGCiSTT 
54351 GGCGGCCCCG GCCCCGGCCC CCAGGCCAGG CAGTGGCGGC CAAGGACCAC 
54401 GCATCTACTT TCAGAGCCCC CCCCGGGGCC GCAGGAGAGG GCCCGGGCTG 
54451 GGCGGATGAT GAGGGGCCAG TGAGGCGCCA AGGGAAGGTC ACCATCAAGT 
54501 ATGACCCCAA GGAGCTACGG AAGCACCTCA ACCTAGAGGA GTGGATCCTG 
54551 GAGCAGCTCA CGCGCCTCTA CGACTGCCAG GAAGAGGAGA TCTCAGAACT 
54601 AGAGATTGAC GTGGATGAGC TCCTGGACAT GGAGAGTGAC GATGCCTGGG 
54651 CTTCCAGGGT CAAGGAGCTG CTGGTTGACT GTTACAAACC CACAGAGGCC 
54701 TTCATCTCTG GCCTGCTGGA CAAGATCCGG GCCATGCAGA AGCTGAGCAC 
54751 ACCCCAGAAG AAGTGAGGGT CCCCGACCCA GGCGAACGGT GGCTCCCATA 
54801 GGACAATCGC TACCCCCCGA CCTCGTAGCA ACAGCAATAC CGGGGGACCC 
54851 TGCGGCCAGG CCTGGTTCCA TGAGCAGGGC TCCTCGTGCC CCTGGCC CAG 
54901 GGGTCTCTTC CCCTGCCCCC TCAG TTTTCC ACTTTTGGAT 1 1 1 1 1 IATTG 
54951 TTATTAAACT GATGGGACTT TGTGTTTTTA TATTGACTCT GCGGCACGGG 
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55001 CCCTTTAATA AAGCGAGGTA GGGTACGCCT TTGGTGCAGC TCAAAAAAAA 
55051 AAAAAAAAAT GATTTCCAGC GGTCCACATT AGAGTTGAAA TnTCTGGTG 
55101 GGAGAATCTA TACCTTGTTC CTTTATAGGC CAAGGACCGC AGTCCTTCAG 
55151 TMCACCAGT GTAAAAGCTT GAGGAGAAAT TGTGAAGCTA CACAGTATTT 
55201 CTTTTCTAAT ACCTCTTGTC ATTCTAAATA TCT7TAATTT ATTAAAAAAT 
55251 AWATATAC AGTATTGAAT GCCTACTGTG TGCTAGGTAC AGTTCTAAAC 
55301 ACTTGGGTTA CAGCAGCGAA CAAAATAAAG GTGCTTACCC TCATAGAACA 
55351 TAGATTCTAG CATGGTATCT ACTGTATCAT ACAGTAGATA CAATAAGTAA 
55401 ACTATAITGA ATATTAGAAT GTGGCAGATG CTATGGAAAA AGAGTCAAGA 
55451 CMGTAAAGA CGATTGTTCA GGGTACCAGT TGCAATTTTA AATATGGTCG 
55501 TCAGAGCAGG CCTCACTGAG GTGACATGAC ATTTAAGCAT AAACATGGAG 
55551 GAGGAGGAGT AAGCCTGAGC TGTCTTAGGC TTCCGGGGCA GCCAAGCCAT 
55601 TTOETGGCA CTAGGAGCCT GGTGTTTCCG ATTCCACCTT TGATAACTGC 
55651 ATTTTCTCTA AGATATGGGA GGGAAGTTTT TCTCCTATTG TTTTTAAGTA 
55701 TTMCTCCAG CTAGTCCAGC CTTGTTATAG TGTTACCTAA TCTTTATAGC 
55751 AAATATATGA GGTACCGGTA ACATTATGCC CATTTCTCAC AGAGGCACTA 
55801 ^GGTGAAG GAGTTTGCCT GACGTTATAC AACCAGGAAG TAGCTGAGCC 
55851 TASATCCCTT CCACCCACCC CATGGCCCTG CTCATGTTCC ACCTGCCTCT 
55901 AATTTACCTC TTTTCCTTCT AGACCAGCAT TCTCGAAATT GGAGGACTCC 
55951 TTTGAGGCCC TCTCCCTGTA CCTGGGGGAG CTGGGCATCC CGCTGCCTGC 
56001 AGAGCTGGAG GAGTTGGACC ACACTGTGAG CATGCAGTAC GGCCTGACCC 
56051 GGGACTCACC TCCCTAGCCC TGGCCCAGCC CCCTGCAGGG GGGTGTTCTA 

seioi caIcagcat TGCCCCTCTG TGCCCCATTC CTGCTGTGAG cagggccgtc 

56151 CGGGCTTCCT GTGGATTGGC GGAATGTTTA GAAGCAGAAC AAGCCATTCC 

lilSi StSSto CCAGGAGGCA agtgggcgca gcaccaggga ^aatctatctc 

56251 CACA03TTCT GGGGGCTAGT TACTGTCTGT AMTCCAATA CTTGCCTGM 

SSl AGCT6TGAAG AAGAAAAAAA CCCCTGGCCT TTGGGCCAGG AGGAATCTGT 
56351 TACTCGAATC CACCCAGGAA CTCCCTGGCA GTGGATTGTG GGAGGCTCTT 
R6401 GCTTACACTA ATCAGCGTGA CCTGGACCTG CTGGGCAGGA TCCCAGGGTG 
56451 MCCTGCCTG TGAACTCTGA AGTCACTAGT CCAGCTGGGT GCAGGAGGAC 
56501 TTCAAGTGTG TGGACGAAAG AAAGACTGAT GGCTCAAAGG GTGTGAAAAA 
56551 GTCAGTGATG CTCCCCCTTT CTACTCCAGA TCCTGTCCTT CCTGGAGCAA 
56601 GGTTGAGGGA GTAGGTTTTG AAGAGTCCCT TAATATGTGG TGGAACAGGC 
566?! W&SAGTTAG AGMAGGGCT GGCTTCTGTT TACCTGCTCA CTGGCTCTAG 
56701 CCAGCCCAGG GACCACATCA ATGTGAGAGG AAGCCTCCAC CTCATGTTTT 
56751 CAAACTTAAT ACTGGAGACT GGCTGAGAAC TTACGGACAA CATCCTTTCT 
56801 GTCTGAAACA AACAGTCACA AGCACAGGAA GAGGCTGGGG GACTAGAAAG 
56851 AGGCCCTGCC CTCTAGAAAG CTCAGATCTT GGCTTCTGTT ACTCATACTC 
56901 GGGTGGGCTC CTTAGTCAGA TGCCTAAAAC ATTTTGCCTA AAGCTCGATG 
56M GGTTCTGGAG GACAGTGTGG CTTGTCACAG GCCTAGAGTC TGAGGGAGGG 
57001 GAGTGGGAGT CTCAGCAATC TCTTGGTCTT GGCTTCATGG CAACCACTGC 
57051 TCACCCTTCA ACATGCCTGG TTTAGGCAGC AGCTTGGGCT GGGAAGAGGT 
! £«m TCTCAAAGCT GAGATGCTGA GAGAGATAGC TCCCTGAGCT 
"57151 GGGCCATCTG ACTTCTACCT CCCATGTTTG CTCTCCCAAC TCATTAGCTC 
57201 CT^CAGCA TCCTCCTGAG CCACATGTGC AGGTACTGGA AAACCTCCAT 
5 251 cffficOC AGAGCTCTAG ^CTCTTCA TCACAACTAG ATTTGCCTCT 
57301 TCTAAGTGTC TATGAGCTTG CACCATATTT AATAAATTGG GAATGGGTTT 
57351 G&GGTATTAA TGCAATGTGT GGTGGTTGTA TTGGAGCAGG GGGAATTGAT 
?im SSct ggttgctgtt MTATTATCT TATCTATTGG GT&GTATGTG 
57451 AAATATTGTA CATAGACCTG ATGAGTTGTG GGACCAGATG TCATCTCTGG 
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57501 TCAGAGTTTA CTTGCTATAT AGACTGTACT TATGTGTGAA GTTTGCAAGC 

57551 TTGCTTTAGG GCTGAGCCCT GGACTCCCAG CAGCAGCACA GTTCAGCATT 
57601 GTGTGGCTGG TTGTTTCCTG GCTGTCCCCA GCAAGTGTAG GAGTGGTGGG 
57651 CCTGAACTGG GCCATTGATC AGACTAAATA AATTAAGCAG TTAACATAAC 
57701 TGGCAATATG GAGAGTGAAA ACATGATTGG . CTCAGGGACA TAAATGTAGA 
57751 GGGTCTGCTA GCCACCTTCT GGCCTAGCCC ACACAAACTC CCCATAGCAG 
57801 AGAGTTTTCA TGCACCCAAG TCTAAAACCC TCAAGCAGAC ACCCATCTGC 
57851 TCTAGAGAAT ATGTACATCC CACCTGAGGC AGCCCCTTCC TTGCAGCAGG 
57901 TGTGACTGAC TATGACCTTT TCCTGGCCTG GCTCTCACAT GCCAGCTGAG 
57951 TCATTCCTTA GGAGCCCTAC CCTTTCATCC TCTCTATATG AATACTTCCA 
58001 TAGCCTGGGT ATCCTGGCTT GCTTTCCTCA GTGCTGGGTG CCACC7TTGC 
58051 AATGGGAAGA AATGAATGCA AGTCACCCCA CCCC7TGTGT TTCCTTACAA 
58101 GTGC7TGAGA GGAGAAGACC AGTTTCTTCT TGCTTCTGCA TGTGGGGGAT 
58151 GTCGTAGAAG AGTGACCATT GGGAAGGACA ATGCTATCTG GTTAGTGGGG 
58201 CCTTGGGCAC AATATAAATC TGTAAACCCA AAGGTGTTTT CTCCCAGGCA 
58251 CTCTCAAAGC TTGAAGAATC CAACTTAAGG AC AGAATATG GTTCCCGAAA 
58301 AAAACTGATG ATCTGGAGTA CGCATTGCTG GCAGAACCAC AGAGCAATGG 
58351 CTGGGCATGG GCAGAGGTCA TCTGGGTGTT CCTGAGGCTG ATAACCTGTG 
58401 GCTGAAATCC CTTGCTAAAA GTCCAGGAGA CACTCCTGTT GGTATCTTTT 
58451 CTTCTGGAGT CATAGTAGTC ACCTTGCAGG GAACTTCCTC AGCCCAGGGC 
58501 TGCTGCAGGC AGCCCAGTGA CCCTTCCTCC TCTGCAGTTA TTCCCCCTTT 
58551 GGCTGCTGCA GCACCACCCC CGTCACCCAC CACCCAACCC CTGCCGCACT 
58601 CCAGCCTTTA ACAAGGGCTG TCTAGATATT CATTTTAACT ACCTCCACCT 
58651 TGGAAACAAT TGCTGAAGGG GAGAGGATTT GCAATGACCA ACCACCTTGT 
58701 TGGGACGCCT GCACACCTGT CTTTCCTGCT TCAACCTGAA AGATTCCTGA 
58751 TGATGATAAT CTGGACACAG AAGCCGGGCA CGGTGGCTCT AGCCTGTAAT 
58801 CTCAGCACTT TGGGAGGCCT CAGCAGGTGG ATCACCTGAG ATCAAGAGTT 
58851 TGAGAACAGC CTGACCAACA TGGTGAAACC CCGTCTGTAC TAAAAATACA 
58901 AAAATTAGCC AGGTGTGGTG GCACATACCT GTAATCCCAG CTACTCTGGA 
58951 GGCTGAGGCA GGAGAATCGC TTGAACCCAC AAGGCAGAGG tTGCAGTGAG 
59001 GCGAGATCAT GCCATTGCAC TCCAGCCTGT GCAACAAGAG CCAAACTCCA 
59051 TCTCAAAAAA AAAAA (SEQ ID NO: 3) 



FEATURES: 

Start: 3000 
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Intron: 52934-55922 
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. CHROMOSOME MAP POSITION: 

Chromosome 22 . 
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DNA 

Position 

941 GAGTMGTGGGTGGTCAGGTTACA6ACTTMTT1TGG6TTAAAAAGTAAAAACAAGAAAC 
AAGGTGTGGCTCTAAAATAATGAGATGTGCTGGGGGTGGGGCATGGCAGCTCATAAACTG 
ACCCTGAMGCTCTTACATGTMGAGTTCCA7W\ATATTTCCAAAACTTGGAAGATTCAT 
TTGGATGTTTGTGmATTAAAATCTCTCACTMnCATTGTCnGTCCACTGTCCGTAA 
CCCMCCTGGGAnGGTnGAGTGAGTCTCTCAGACTTTCTGCCnGGAGTTTGTGAGAG 
CA.T] 

GATGGCATACTCTGTGACCACTGTCACCCTAAAACCAAAAAGGCCCCTCTTGACAAGGAG 
TCTGAGGATTTTAGACCCAGGAAGAATGAGTGATGGGCATATATATATCCTATTACTGAG 
GCATGAGMGAGTGGMTGGGTGGGTTGAGGTGGTGTTTTAAGGCCTCTrGCCAGCTTGT 
nAACTCnCTCTGGGGAACGAGGGGGACMCTGTGTACATTGGCTGCTCCAGAATGATG 
• TTGAGCMTCnGAAGTGCCAGGAGCTGTGCTTTGTCTATTCATGGCCCCTGTGCCTGTG 

2612 TGAGnGGMCAGTHGATACCAAAACCATCCCCCCGCCCCCCAACCCCCAGCCTAGGGT 
CCGTGGAAAMTT<mCCTGGTGCCAAAMGG™^ 
mAnCMTGTTGGnGAGTAAATGAGCTCnGGAnAGGTGATGGAAAMTCTGAAAA 
MCAGGGCTTnGAGGMTAGGAAMGGCAGTMCATGTnAACCCAGAGAGAAGTTTCT 
GGCTGTTGGCTGGGMTAGTCATAGGAAGGGCTGACACTGAAAAGAAGGAGATTGTGTTC 

[G.A] 

TTTCnCnCTCAGAGCTATMGCAMGGCTGAAAGTTCTAGAAAMGGCMGTTTTGTT 

TCAGTAGAAAAMGGIATMTCAGAACWT7Tn"AGAAMTGGAATGAGACTACTTn 

GCCATGAGTTCCTTGTCCCTGGAGAGATGAGCAGAGGTTGGACAAGTGCTTACCAGAGAT 

CnGTG^GGCAGAAACTGTGCATCTAGCAGAGCATTGGCCTAACCCTTTCAAATGAGAT 

GCTGTTMCTCAGTCTTATTCTACATGGTAGGMTCCTGTCCCTTTGCCTCCTGCTACTT 

5080 A(^CGTAAMTAGnGAMTnGnGGTGGAM 

ATGGGCATGCCTGGCCCCCAAGGTCTGAAGTGGTAGGGCTGTGCCTATATCCTGAGAATG 
AGATAGACTAGGCAGGCACCTTGTGCTGTAGATTCCAGCTCCTGCACATAGCTCTTGTTG 
TAAMCATCCCTGTGCTTATACCAAGTMTTGAGTTGACCTTTAAACACTTGCCTCTTCC 
CTGGGAACCATATAGGGGATTGGCCTGGAGACGTCTGGCCTCTGGAAGAGTTGGAAAGCA 

[G.A] 

CCATCATTATTATCC7TTCCTTTCAGCTATMCTCAGAGCTCT(^GTC7TTTCTGTGGA 
TCmTTGCCntmCTrGCCCCimACTCCCAGGGAAGnGATTCTGTCTnTCTGT 
TCCAmACTATGACAGGAGCAGlAGAATGTCAGAGCTGTMGGGACCmTAGTTAAAGC 
CmGGCTGGTCCmCATmATAGCT(mCTMTMGTMCGTCAAAACGCAATGAG 
TTCACAGAnGGGTCTCGCCnGGCATCTMCCCATATGnCATATTCnGCTffrTTTCC 

6599 ctgtmtcctagcactctgggaggccgaggcagmggatcgcttgagcccatgagcccag 
gagtttgagaccagcctggccmcatggcaaaactccacctctacaaaaaatacaaaaat 
anagccaggcgtgatggcacacacctgtagtcccagctacngggmgctgaggagcga 
tgattacctgagcccagggaYa^ 

CATCCAGCTGGGGfiACAGAGTGAAACCCCTGTCTCAAAACAAAACAAATGAAAAAAAAAA 
C-.A.C] 

CCnAATAATCAGTAACTGTCACTTTATATrATGTTGTGAGTGTGTGTCTATATACACCT 
ATATGTATACATTTCTCnATTACACATTCATTGGTGATCTGATGTGGAGCCCCAGGGAT 
TAAGGGCMCTnGMCTACCCTGACACMTCAAGCCAMTATCATTCCCGTGGAGGAAG 
TAGAGTATCTAGGlTCTGTCTCCTAGTrGCAGCTnACCTTGAGGACAGAGACTCTAATC 
CAGCTGTGCTGMGGAGCACATCTCCTGACnCTGAGCTnCCCCTGGTAMTTCAAACT 
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6983 CACATTCATTGGTGATCTGATGTGGAGCCCCAGGGATTAAGGGCAACT7T6AACTACCCT 
GACACAATCAAGCCAAATATCATTCCCGTGGAGGAAGTAGAGTATCTAGGTTCTGTCTCC 
TAGTTGCAGCTTTACC7TGAGGACAGAGACTCTMTCCAGCTGTGCTGMGGAGCACATC 
TCCTGACnCTGAGCTTrCCCCTGGTAAATTCAAACTGGATGTCACGGCGCCCTCAGATA 
GAGCCTGGTMTTTGCCCTGGGGAGAGTGACTGTCTlTrGGATCTMTTTGACTTTTGCC 
[C.G] 

CAGTTGGAGGAAAATCTTCAGGGCTAGGAAGGATTGTATTTGTCTGACCCCAGAGATAAC 
CTGGGTTnGAGGMCATGGGGCATCMCCTGMTGGTCTTGTAAGATCTCTCCCACGCC 
AGCTTGCC^GTGTTTCTCTGATGAATTTAGAGTACCTGAGTAGTGCAGGCCTGCTGGGAG 
GAGGACTCTCCCTCTGTGCTACTCAGAGAAAnCATTCTTCAAGGCCCCCTTCCAGCCTT 
GCTCTTACCCAGCTGGGCTACAGTTACMTAMGGAAATGACTTTTCTTCTCCCCTTCCC 

9885 GGCGTGCCACCACACCTTGCCATTTTTTTnATTrTMGTAGAMCMGGTCn^ 

ACTATGTTGCCCAGGCTGGTCTTGAACTCCAGCGATCCTCCTGCCCCAGCCTCCCAAAGT 
GCTTGGGATTACGGAAGTAAGCCACTGTGCCTGGCCAGTGCAACCCCCAT7TTATACTAA 
AACAGGAAGGCCCAGAAAGGTTTGGAGTAACTTGTCCAGGGTCACACAGATGATATTTGA 
ACTCAGGTCTCCCTGGCTCCCMGAGAGTCTGCTTTCCACTAGGACTCCCAGGAGAAAAA 

[A -] 

AAAAAAAAAMCAGTAGACTTGGAGACAGAAMTCTGATTTGAGTCTTAGTTGAGCTAGG 
CTAACTGTGTMCTGTGGG(^GTTCCTTAGCCCCTGTGAGCCTCAGT7TCTTATCTGTA 
AMTGTCATAAMGAMTCCATCTCATGGAGTAGTTGTGATGATCAAGGACTCTGAAAAC 
ATTAGAATGGTTTAATGTGAAGGATTAGCAGCAGCAGATGGCAACATTGTGCATCTTATA 
TTMCTATCCAMTATATCMGCGTCATTTGCTATATATAAAAGTCATCAAATTAGGC AC 

12538 ACTTGGGAGGCTGAGGCAGGAGMTCACTTGAACCTGGGAGGCAGAGGTTGCAGTGAGCC 
CAGATCACGCCACTGCACTCCAGCCTGGTGACAGAGTMGiACTCCATCTCAAAAAAAAAA 
AAAAAAAAAAAMnCCTTMmGGCCTACAGTAGAGCCCTCCGTAATGTGGCCTCTCT 
CCACATCTCCACMCCTCCTGGTCCCTGCACTTCAGCCTCACCTCTCTTCTGGACAGGCC 
CTCCTTCTGACMGCGCTnGTTCATTCTGCTCCCTCTGCCTAGAATGCCCCCTTACTCT 

rp T*i 

TTCACTTMCTCCTGCTTATCGTnAGATCTTTACCTGGATGGCTCAG^ 

GTMTTCCTWCCCtGAAAMTAGGTTAGGTCCCTGTTTTATGTTTTCATAGACCT^ 

TTTGAGGCTTTTTTTAAAAAAGTAGTTTT^ 

TMTGATATCnMGACCTCTMTAGAACMTTTGGTCATGGACTGTGGGGTTTTTGCCC 
CTCATTGTGTCAGCACTGAGCATATTGTTGGCATAGGAGGGATATrrGTTGMTGAATTG 

17707 GTAGTGGGTGCTCAGAGTGTTTGCTG^ 

CACTTGMTAAAGTCCATCCAGTATGCACCAnACCATCTCTTCGCTCTACAATAnCTT 

TrAGGCMGAGCnATCTTrTGAGGTGATMGATMGCTCAMCmTGTAGACW 

CTCAGTCTGTAAATGTCATCCC^^^ 

GCATGCCTCTGCMCTtHAGC^ 

[T C] 

CCCAAMGCTAGAGTCCCnCTCCCATGGGCAGTGCTGGMGTGTGCTMCAAATTCT^ 
CTCCATACTGCnACGAnACAAAAAAAACCCTCAGCATCTCATGCCAGACTTGAGTTM 
GGTTGTTnCTTTTGTGTGTCAGiCTGW 

GAGATrrTGCTGAGATCAGAGGGTGCTCCACTGCCATCAGTAGCACTGACTCnGCAGAA 
GCACCGTTTCTGMGTTGGCTMTGTCATCCCTCACGTnGTnGTnGAMTTTGTnT 
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18219 TGCCATCAGTAGCACTGACTCTTGCAGAAGCACCGTTTCTGAAGTTGGCTAATGTCATCC 
CTCACGTrTGmGmGAMTn"GTmAGnCCAGAGATAGCACTTTCATGGAATGAC 
GCTATCTTCTAGAATCAC 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 GAGTTGGAGTCTCGCTGTGTCGCCAGG 
CTGGAGTGCAGTGGCACAATCTCAGCTCACTGCAATCTCCACCTTCCGGGTTCAAGTGAT 
TCCCCTGCCTCAGCCTCCCGAGGAGCTGTTACTACAGGCGCACACCCCCACTCCTGGCTA 
[-.A] 

TTTTATGTGTTTTAGTAGWGACGGGGTTTCACCGTGTTGGCCAGGATGGTCTCGATCTCC 
TGACTnGTGATCTGCCTGCTTCAGCCTCCQ\MGTGCTGGGATTACAGGTGTGAGTCAC 
CGCGCCTGGCCTAGAATCACCTTTTTATACCATAACGTGAGCACCACTGCCGCGTCACCA 
AGGAMGAGAGAGGCAGCTACTGTGGGGTTACAAATGGGTAAGAGTGGCACCAGGAAGGT 
GAMGTCTCTACmGCC^GGCnMCAAMTGTCMTCACCAMCATnATTTATTAA 

19670 GACCCCCATGATGAGCMCTATAGWCTAGAACAGTGATMTMCTMTGTTTATAATGC 
ATCTTCAGTTTACAGAGGGCTT7TGTACTCATCATCTAGTTTAGTTCCTGCAACAACCTC 
TTGAGGMTATAGCACMGCAGGACMGGGAAGCCCAGAGATGTTAMTMTTTATCCAA 
GTTTATGCTGCTGGGMGGGCAGCACTGAMTTAAMGAAMGTnTCTGAGCTCAAATC 
CCATGCCCTTTCCTCAATGTGAGCTCTAGCAAGGTATTCAGGAATCCTGCCTCTACAGTT 
[C.T] 

AGAGCCTCAMnGCTGGGTATGnGAGTTCnGTATCTGATTTTTCTAGATTTCCTGCC 
CACATTCnACTGTCTGGATATCAGGAAAGAGTTTATCAAATGCCTGTGGAAATCCAAGA 
TMGGTCTCATGATGAGTMCCCAGT^GAAMCATGMGTCMGTCTAACTAGTCACTACT 
ATnCACTACTGCTGACTCCTGATGATWGCTCCTmCTMGTGCmCTGTCCAC™ 
nCCATCATCTGCCTAGAATTTATCT 

21153 GGACCCnGTTTTAGMGGATGACTGCTGCTATMTGTAGAAAGTGATTTGG^ 

AGGAGTGGGGCACGAMGATGGnAGTAGATGGGGGTGGTMTGCTTACCTTTCAGTATT 
TGGAGGCTTCGGAGTCCTCAAAAATTCTCnCCTTGATTGGAGTCCTCCCAGCCMTAGA 
GGGCTTCACACAMCAGTTTCTTGGGTTTT^ 

AMGGnGGGGTGAnCATTCACTTACCACACCnGCCTGMCATTCACTTGGGGCTGCC 
[G.T] 

GTTATGAAGGCTATTGnCTCCAGCCTGTC^ 

GGTTCTMGGAGTCAGTTTGTTCAGCTCCGTGCWGGTTTCCAACTTATGAAATGTGCTG 
GAGATTMCACCTCTCCTGCCATTnATCCCTACTATMnGCCAGTCAAAGGATTCCTG 
CAGTTGCCTCTGGCAGCCATMCTGATGAATGTTCTGCCAGCTGCTCTGAGGACCTAGAA 
GAGCAGTmCTATCCAGGACCAGTnCCMGGGTGGGAGGGTGAAATATATCCTCCAGT 

24566 CTACTCTGGAGGCTGAGGfGAGAGGATCACnGAGTCWGAAGGTCGAGGTCAAGATTGT 
AGTGAGCCATGATGGCATCACCGCACTCCAGCCTGAGTGACAGAGAGAGACCCTGACTCA 
AAAAAAAAAAMCAAAAAAAAAAMCACCCTCACGACTTATCAGCTATTTGTCTTGAGM 
TACTGACATMCCCCTCAGMCCTAmCCTMTCTGHAMTGAGGCTGATGACGmC 
CTCCTmACTGGCMmAAACATGATGGATMTAMTGCTMGCACTTAACACAGGGC 
[C-] 

TAGAAGATATTMCTGCTCMTAMTGCTAGCnCTTMCAGTATTC^ 

CTTAT(^CATGCAnGTTGTCCCTGTGTCCAGTTGGTGGMTGGGAAMGGCTCCCtTGT 

MCCCCATCTACCATCTTTATCAGACTTTCCTGCCATGGnCACAGTMGAGATAGAAGC 

TGCACGGTGACTTCTGGCTCTnACMTGGTGAGCGGTGTGTGCCTGGTAAGGGAGAGCT 

GATGT(^CTGCCCCAMTCCAGTAGTGAGATn'GAGTGTTCTGGTTTCCTCCAGCAGCCT 

FIG.3-28 
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26604 GATTT6CAGCTGAGCCTGTCTATCT6GTGTGGGAAGAAGATGGGGAGTTACTTGTCAGTC 
CCGGCTTACTTCACCTCCAGAGACCT6TTTCGGT6AGTTGGTCTCCGA6TTCCCCTCTCC 
ATCTCTCCTGGCCCCTGGTCGTGAGAGGAGGGTGGTCTCCGTAAATCTCCTTCTCACTTA 
GTCCTTTACCATCGGTTCTGCCGGGCAGAAGCCAGCGGAGGTTATACCGAAGGAGAATCG 
GCCWGTGAGGTACCCCCATTATGTCCTGGAAGTGGTGAGGGGAGGGATATACCCAGAAG 

[G.A] 

AACTTCTTAGGGAGCTCCAGCTCCCCTTCTATCCCAGACAAACCTGAAGGAGCCTCCAAA 
AGATGCCACTGACCTGCCCATTGTAGATGTTACTGCTTCCGGGGGGAATAGCCCAAATAG 
AGTGCTGTTTCCAGCTCTCACATGTCTTACCTGCGGGCCATGCTGCCTGCCCAGGAATTT 
GTCCCMCMGCAGGATGGGCAGGTTTTGCCAAACTGTGGAAACTGGCAAGTCCTGGGTG 
TGGGTAGCCTGGTACACAGTAGGCACC7TATAMCGTTTGTTCTCTTAATGGCAGGCACA 

27255 TGGGGAMGACCTGGGCGAGTGCTTCTAAGACTGGAGCAATGGGCTTTAGAGTGTTCCTG 
AGCTGCTGGGCCAGCCCCCACACCTCCTCAGTCCCTAGGCCTAAGTACCTCCACGAGCCT 
CTCTCTGTGGGGCnCTCAGAGGGAGATGTGGAAAGTCTACCTCTMCCTGGCTTTCTTT 
GCTCATTGCCCCACTCCACCTCCCATAGAAACTCCCCAGGGGGTTTCTGGCCCTCTGGGT 
CCCTTCTGAATGGAGCCATTCCAGGCTAGGGTGGGG 1 1 IGI 1 1 1 CATTCTTTGGGAGCAG 

rc gi 

CTGnGTTCCAAAAAGGCTGCCTCCCCCTCACCAGTGGTCCTGGTCGACTTrTCCCTTCT 
GGCTTCTCTAAGCTAGGTCCAGTGCCCAGATCTTGCTGCCGGGATACTAGTCAGGTGGCC 
AGGCCCTGGGCAGAAAAGCAGTGTACCATGTGGTTTTGTGGAATGACCGGACCCTGGTAG 
ATTGCTGGGAAGTGTCTGGACAGGGGGAAGGGGGAAGGGAACTGGTCCTCAATGCTGACT 
CTACCMGCGCCCTGCTAGACACTnATCCTTTOA^ 

27399 AGATGTGGAMCtCTACCTCTMCCTGGCmcmGCTCATTGCCCCACTCCACCTCCC 
ATAGAMCTCCCCAGGGGGmCTGGCCCTCTGGGTCCCnCTGAATGG^GCCAnCCAG 
GCTAGGGTGGG GI I IUI 1 1 ICATTCTTTGGGAGCAGCCTSTTGTTCCAAAAAGGCTGCCT 
CCCCCTCACCAGTGGTCCTGGTCGACTTTTCC(TrCTGGCTTCTCTAAGCTAGGTCCAGT 
GCCCAGATCTTGCTGCCGGGATACTAGTCAGGTGGCCAGGCCCTGGGCAGAAAAGCAGTG 

[T C] 

ACCATGTGGTTnGTGGMTGACCGGACCCTGGTAGATTGCTGGGAAGTGTCTGGACAGG 
GGGAAGGGGGAAGGGAACTGGTCCTCAATGCTGACTCTACCAAGCGCCCTGCTAGACACT 
TTATCCTTTMTCTCTCMCAGCCTAMGAGAnATATATCCCC ATTTTA CAGATGAGGC 
MCCAGTTTCMCAGAGnMCATATGGAGCCTCACTGGGCAGCTfTTTCTGTCTTCCTG 
ACmCTCTCATCCnCAGGGGGCTGCAGGmGTnTCnCTCCTAGTGGAGAGGAAAT 

28088 MGAGCCMTGGAAATTGATCnGAGilTrAGGAGAAAGCTTTrACA^ 

GCCMGTGTTGMGTAGCCACATTTCAGGTCCTCAnMTnCTCTTAATCCTGGGAAGG 
CAGCTTAGGAGMGGGTTGTTCGTTTAGGAGCCAGGMCTATACCCCTTnACCCn * 
GAGGCAGGGMGCCAGGGAGGACACMCnCTCAGGMGAGGAGAAGCTAGAGCAGATAG 
TGAACTCTCAACCTGAACCTTTAAGGGCCAGACCAGTAATGCCACCCAAGTCCACCTGCC 

£G A] 

TTTGTCTTGnCTGTCCCAGGCTnCTGGAGAACCTGATCTTCTTGCCCCTACCCCCAAG 
CTCCGTTTGCCCAGCTAGAGTCTGGGGGGTACTGACTGACTTTCGTAGACATTCTTCCCT 
TCCCCAAATMGAGGCCACATTCeTGAAGTCACTTCTGAAGAGATAGCTGCCACACAGGG 
CTCTnCCCCCCAGGGAGGGACCACCCAGACCCTCTGCTCTCCCAGGTATCCGTTACCAC 
ATCACTACCTGGTCAGAMGCTGTnCTGCCAnAGCCCCTCCCTCTTnATTATAGGAT 

FIG. 3-29 
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28734 AAGTAGAAGCTAGACTTCTTGGGCTCCTGAACAGGGTCCTTGCTGGATTCTGTGAAACAA 
ATTMGTTCTT6ACCCTAGGCCTCTG6GGGAGTACAAAGTCTATGGGAGTTCTGGGGCTG 
TGGTTGCAAGGAAAGTGACGCAACCAGATTCCATGGGGACATGATCAGGCGTGACATGTG 
AGGGAGGMGAGGGAGCAAGGGAATGAAGAATACAACTTCTGTGTGCCATACACCCCTGC 
CTGACAGGCCATACATACTCAGCAGAGAATGCACTGTCtTTCCTACCACACTAGCGTGAG 

[G.A] 

AGTGAGCTGCMTTACCACTGTGCTTCCMGTMGAAMTACCTCA MTTGGMTTTA CA 

AMGAGGTAMTTAGGGAGTGGCTTTTGTCGGACATOT 

GMTnCACTTMTGTCCMTACTGATnMTGAGCnGGGTTTACACATTATCTCnGA 

AGAAMCAMTGMCCTnGTGTTCCAMGCMTCCATGTTTAMGGGAAAAAATTATGC 

ATAACTCTGCCCAGCTTCACAGTAACCTTTGGCAGGTGCCTTAGGTCCTCTGGGACTCTT 

29246 MTCCATGTTTAMGGGAAAAMnATGCATMCTCTGCCCAGCnCACAGTMCCTTTG 
GCAGGTGCCTTAGGTCCTCTGGGACTCTTTTCCTTATCTGAAAAATGAAGGACTTGGATC 
AGGTGMTGGnCCeAGCTCTGCMCnATGTGGCTCCTCAGAGGCACACMGCTCTTTT 
CCATTATTTGCCAMTMTGGAGGCCCTGTCTTTMCTGCAGTACAACTACACAAAATAC 
TTGAMCTACAGTCTTCCTGGTT7TTGGTTGGAACTGMTCAGTGCACTCTAGCM 

[-.T] 

ATncnGCTGTTCGTAGGCnCATW 

ATATTCCATMTMTTACAGCTTMTTGGCAGACTGTTTCAGTCTATAGGATCTGCAGGA 
AGGAGGAGTMTAAAGGGATTTTTGACTGAGCTCTTATGGAACAGAGTCTCTCTAGGCCC 
CTGTCATATCTGCCCTTCTGGGCCCTGGGGAAAAGTTGGCATCCCCAGTTGTGGTGCTCT 
CCAGGTGCCCTCAGGCTGTGGTGGAGGGAGCTTCCCAnCTCTCCTTCAGCCCACTCAAT 

29490 MCTACAGTCTTCCTGGTTTTTGGnGGMCTGiMTCAGTGCACTCTAGCMCACTTATT 
TCTTGCTGnCGTAGGCTTCATTATGTGTTTGGTTMTTTTTTAAMCMCMTM 
TTCCATMTAATTACAGCTTAATTGGCAGACTGTTTCAGTCTATAGGATCTGCAGGAAGG 
AGGAGTMTAMGGGATTTTTGACTGAGCTCTTATGGAACAGAGTCTCTCTAGGCCCCTG 
TCATATCTGCCCTTCTGGGCCCTGGGGAAAAGTTGGCATCCCCAGTTGTGGTGCTCTCCA 

GTGCCCTCAGGCTGTGGTGGAGGGAGCnCCCATTCTCTCCnCAGCCCACTC 

AGGCTAGGGGCTGAAAGAAGCTTCTCTACAACTGGCTGTTCACTGGGAGGTTAAGGGATG 

ACCATCCAGCCAGGCCTTCCTCAGGACATGGGAGGGCTTATGCTTTAACATGTGTAAATC 

CACTGCMTMTGACTGGnCTTTTACCCCATMGGTTGAGAATnACCTGTAAACATTT 

TTGTCTGAAGMTTTGGATGTMGTGAGGGCTGGGCCTCTATCnATCTCACnGGCTTC 

29934 GGACATGGGAGGGCTTATGCTTTMCATGTGTA MTCCAC TGCMTMTGACTGGTTCn 
TTACCCCATMGGTTGAGAATTTACCTGTAMCATTTTTGTCTGiAAGM 
GTGAGGGCTGGGCCTCTATCITATCTC^^ 
nGncnACACATCCTAGATGCACAGTMCTATrtCCTMTTAnAGAM 
AT^TTGATTTCAGCTGGGCTTGGTGGCTCCnCCTGTAATCCWGCACTTTGGGAGGC 

[T C] 

AAGGCTG^GGATCACCTGAGTCCAGGAGTTTAAGACCAGCCTGGGCAACATAGGGAGAC 
CCTGTCTCTACAAAAMTAAAAAATTAGCCAGGCATGGTGGTGTGCACCTGTAGTCCCAG 
CTACTCAGGAGGCTGAGGCAGGAGGATCTCTTGAGCCTGGGAGGTCAGACTACAGTGAGC 
MTGATTGTGCCACTGCACTCCAGCCTGGGTGACAGAGTMGACTCTGTCTCTTAAAAAA 
AAAAAAAAAAMGTTGATTTCTATnGGATAGATAMTMTTCATTnAGGAC 
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34480 CTGACTTCAAGTGATCCACCCGCCTCGGCCTCCCAAAGTGCTGGGATTATAAGCATAAGC 
CACTGTGCCCAGCTGCTCTCTATATTmMTACATAmmCCATTMTmCACAK 
AGTTCATrTTATAGATGAGGAAACTAGGCCAGAGAAGTAAAATATCTTGCCCAAGATGAT 
GTMCTAGTMGTGGCAGGATCMGATTCAAACCMGCMTGTTCAAACCTCTTGGAAGC 
MGMTGTGGCCACTGtGGMGGTGCMGGCCtTGACMCMGMTAGGGAAAAGAAGGA 
[A.G] 

CTAGAAGGAAAGAGATGGCATGGGCTCAGCAGGCCAGGGAGCTCTTAGCTGTGTGTGTTG 
GGAAGCTCAGAAGGGAGGAAGAGGTTGTCTGTGCAGGTAAGTCCTGAGAACACACCAGAC 
TTTTGAGAGGIGGAGCTTCATAGCCAGGTCATTAGGGGAGAAGGGAGCTATAGATTTTTT 
1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 1 I I A GAGACGGGGTCTTACTATGTTGCCCAGGCTG 
GTCTTGMCTCCTGGGCTCAAGTGATCCTCCCACCTCAGCCTCCCAAAGTGCTGGGATTA 

38812 AMTCCAGCAGATCCAnGAGAGmMGCAGCMGGTGnGTGACCMGTTMCATTTT 
AGAAGGATCACTGGTATGGAGGTTGGATTGGAGAGGGGAAAGCCTAAAGGTATAGAGACT 
AGnAGGMGCTATTGTAGGCTGGGCATGGTGGnCATGCCTGTAATCTCAGCACTrTGG 
GAGGCTGAGGTGGGAGGATTGCTTGAGGCCAGGAGTTGAAGACCAACCTGGCCAACATAG 
CMGACCCCGTCTCTGTTmcnMTTAAAAGAAAAGTCCAGACGTAGACATAGTGGCT 
[T.C] 

ACGCCTGTMTGCWGCACmGGGAGGC^GGTGGGCAGATTGCnGAGGTCAAGAGT 
TTGGGATTAGGCCAGGCGCAGTGGCT(^CGCCTGTMTCCCAGCACTTTGGGAGGCCGAG 
GTGGGCGGATCACAAGGTCAGGAGATCAAGACCATCCTGGCTAACACAATGAAACCCCGT 

ctctactaaaagtacaaaaattagccgggcatggtggcggacgcctgtagtcccagctac 
tcgggaggctgaggcaggagmtggcgtgaacctaggaggcggagcttgctgtgagcaga 

40731 gtrctgtcctatgtctgtctctcggatgaagctgagctggctttcagaagcctgcagagt 
taggaaaggaaccagctggccagggacagactatgaggattgtgctgacccagctgcccc 
tgtggggatcacagtttacagccagagcctgtgcggacccagctgtctgccaggtttcct 
tagaaacctgagagtcagtctctgtccactgaactcctaagctggacaggaggcagtgat 
gctaaaccctgaagggcaacatggcctatggagaaagcatggagctcagagcctggagta 

CC.G] 

GGGCACAGATAGGATTGMTAAAnGTGTAGAAAGACTTTGAAAACAATAAAGCAAAAGA 
TGMTGMCGTTTTTTTTAGACTTGAGGGACCAACAACCCCCAAACCCCAGATTCTGCCA 
GGTCCATGGGGMGGAGAAGTTGCCTTGAGTGGAAGCCCCAAGTAGGGAGACTTACAGAA 
AAGAAGTCAAGAGCACTGGCTCCCAGGCAGAAATACTGATACCCTACTGGGGCTTCAGGC 
TGAGCTCCTCCCTTCACAAATCACTTCATCTCTCTGAGCCTGTTTCTGCATCTGTGACAT 

41303 CTCTGAGCCTGTTTCTGCATCTGTGACATAAGATGGTAAGATAAAGGTGGCTGTCTCACC 
MTTATGTMGGAnAAATGTGGAAAAGGACATAMGTTGTATAGTGCTGCCATAGGGAC 
AGTGnCAGTAMCGTGACACAncnAGTATCACTMGMTCAGGTTCTTGGCCAGGCA 
CCGTGGCTCATGCCTGTMTCCCMCACTCTGGGAGGCCTAGGTCGGAGGATGGCTTGAA 
CACAGGAGTTTGAGACCAGCCTGAGCAACATAGTGAGACACTGTCTCTACAAAAAAAAAA 
[T.A] 

MTMTMTMTTGTTmAAmGATGGGCAGGGCACTGTGGCTCACACCTGTAATCCC 
AGCACTTTGGGAGGCCMGGCCGGAGGATTGCTTGAGGCCAGGAGTTCAGGAGCAGCCTG 
GGCCACAnCCTGTCTCTACAMGAATAAAAAAGTTAACTGGGCATGGTGGCACATGCCT 
GTMTCCCAGCTACTCMGAGGCTGAGGAGGAGGAnGCCTGAGCCCAGGAGTTCAAGAC 
TGCAGTGAGCCTTGATCACACCACTGTACTACAGCTTGGGCAACAGAGTGAGACCTTGTC 
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41305 CTGAGCCTGTTTCTGCATCTGT6ACATAAGATGGTAAGATAAAGGTGGCTGTCTCACCAA 
TTATGTAAGGATTAAATGTGGAAAAGGACATAAAGTTGTATAGTGCTGCCATAGGGACAG 
TGTTCAGTAMCGTGACACAmnAGTATCACTMGMTCAGGnCTTGGCCAGGCACC 
GTGGCTCATGCCTGTAATCCCAACACTCTGGGAGGCCTAGGTCGGAGGATGGCTTGAACA 
CAGGAGTnGAGACCAGCCTGAGCMCATAGTGA 
[-.A] 

TMTMTMnGTTrTTAAnAGATGGGCAGGGCACTGTGGCTCACACCTGTAATCCCAG 
CACTTTGGGAGGCCAAGGCCGGAGGATTGC7TGAGGCCAGGAGTTCAGGAGCAGCCTGGG 
CCACAmCTGTCTCTACAMGMTAAAAAAGTTAACTGGGCATGGTGGCACATGCCTGT 
AATCCCAGCTACTCAAGAGGCTGAGGAGGAGGATTGCCTGAGCCCAGGAGTTCAAGACTG 
CAGTGAGCCTTGATCACACCACTGTACTACAGCTTGGGCAACAGAGTGAGACCTTGTCTC 

41457 CTAAGAATCAGGTTCTTGGCCAGGCACCGTGGCTCATGCCTGTAATCCCAACACTCTGGG 
AGGCCTAGGTCGGAGGATGGCTTGMCACAGGAGTTTGAG ACCAGCC TGAGCAACATAGT 
GAGACACTGTCTCTACAAAAAAAAAATMTMTMTMTTGTnTTMTTAGATGGGCA 
GGCACTGTGGCTCACACCTGTMTCCCAGCACTnGGGAGGCCAAGGCCGGAGGATTGCT 
TGAGGCCAGGAGnCAGGAGWGCCTGGGCCACATTCCTGTCTCTACAAAGAATAAAAAA 

[G.C] 

TTAACTGGGCATGGTGGCACATGCCTGTAATCCCAGCTACTCAAGAGGCTGAGGAGGAGG 
ATTGCCTGAGCCCAGGAGTTCAAGACTGCAGTGAGCCTTGATC ACACCACTGTA CTACAG 
CTTGGGCAACAGAGTGAGACCTTGTCTCCAAAAAAAAAAG 1 1 IGt 1 1 1 1 1 1 1 IATCCACT 
CTCCTCACCAMCAMCTGAGTAAGTTAGAGCCCTCTCAGCTGGCATGTGTTGGAAACAG 
TGCCCTCTCAnAMGTGCTGCCCTCACTCCCATTGCCTCTTGGCCTTGGTCAGTATGAT 

43168 AGCTACTTGGGAGGCTGAGGCAGGAGAATCGCTTGAACCTGGAAGGCGGAGGTCGCAGTG 
AGCCGAGATCGTGCCATTGCACTTCAGCCTGGGCGACAGAGCGAGACTCTGTCTCAAAAA 
TAATAATAATAACAATAACTAGCCGGGCCTGGTGGCACATGCCTGTAGTCCCAGTTACTC 

AGGA(^GGAGGCATGAGACTCAGGTGMCTAGG6AGACAGA(^1TGCAGTG^^ 
TWCACCACTGCACTCCAGCCTGffTTGACAGAGCGAGACTCTGTCTCAAAAAAAAAAAM 

CCCATTTGCTCATTTTTTGGATACTAGTATMCTATC^^ 

ATCMGCAGATATGGGAGATGGTGAATTACCATCTACAGTGTTGTCATATATGTCACATA 
CTGAGCAnATCAGCTAGTAGAATCTAGnMTTGnCTATGTGTGATGTATGCAGAGTT 
CCCATTTTGAATGTG1TTTTACTATGCTTAAATAAATGACTGATGTCAGCAACCCCAAAA 
TGATACATCTGATGTMGAGCCCCTGTTCCCCMTMTMCATCTAMCTATAGACATTG 

43357 AGGCATGAGACTCAGGTGAACTAGGGAGACAGAGGTTGCAGTGAGCCAAGATCACACCAC 
TGCACTCCAGCCTGGTTGACAGAGCGAGACTCTGTCTCAAAAAAAAAAAMTCCCAT^ 
CTCATTTTTTGGATACTAGTATMCTATCACTCTAMCCAGTTAGTACTTAAATCMGCA 
GATATGGGAGATGGTGMmCCATCTACAGTGnGTCATATATGTCACATACTGAGCAT 
TATCAGCTAGTAGAATCTAGnMTTGTTCTATGTGTGATGTATGCAGAGTTCCCATTTT 

MTGTGT1TTTACTATGCTTAMTAMTGACTGATGTCAGCAACCCCAAAATGATACATC 
TGATGTMGAGCCCCTGnCCCCMTAATAACATCTAAACTATAGACATTGGAATGAACA 
GGTGCCCCTMGmCCTCCCTCC^GGmCTTGGCCGGTCTCTGAGGACTACACATrc 
CTACTCCCGTCmCCTCATCTTCAGGCGCAGTAACAGTATCTCCAAGTCCCCTGGCCCC 
AGCTCCCDW\GGAGCCCCTGCTGTTCAGCCGTGACATCAGCCGCTCAGMTCCCTTCGT 

FIG.3-32 
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45664 CCAGCTTTl 



47549 



, .CCTTGGCTTCCCCCACCCCCAGGTGAMGTGATGCGCAGCCTGGACCACCCC 

A^G^GGCTTCACTGGGAGACCA(^TTGACCCATGGGGCCTGGACCACGAGTGGGACAGG 
GCtSgIXTC^^ 

GGTCAGGTnGCCAA^GMTCGCCTCCGGAATGGTGAGTC^ 
CAGSGCGAGAGTAGGGftGAGCTG^ 
TCCTATGCMOT^ 

MTTAGCTGGGCGTGGTGGTGCACGCCTGTAGTCCCAGCTACTGAGGAGGCCGAG^AGG 
AGAATAGCTTCAAC^GGGAGGCAGAAGTTGCAGTGAGCCMGAT^ 

fA(£CTGGGTG^ 

A1Qm R(^GTTACGAGTGGCTTGAGGTGTCACTTACCAGACATTTGGGGGATGGGG^ 
47908 ^^GTTGAGXA^TGGTT^GMGAGCTAGCAHGA^^ 

Sgtcacctmtctgcagagaagg^ 

tg^ggct^gaaaaamcagacacacaagagtctcagga^^ 
agt^ggaStttSgcactccctca^^^ 

CAtt^ 

GAAGGAAGATrTGTTTTAAAAAAnGnnCTTTMTATTMTTGMCACCTU 

FIG.3-33 



U.S. Patent Jan.22.2OO2 Sheet 39 of 41 US 6,340,583 Bl 



54654 GGCCCCGGCCCC6GCCCCCA6GCCAGGCAGTGGCG6CCAAGGACCACGCATCTACTTTCA 
GAGCCCCCCCCGGGGCCGCAGGAGAGGGCCCGGGCTGGGCGGATGATGAGGGCCCAGTGA 
GGCGCCAAGGGAAGGTCACCATCAAGTATGACCCCAAGGAGCTACGGAAGCACCTCAACC 
TAGAGGAGTGGATCCTGGAGCAGCTCACGCGCCTCTACGACTGtCAGGAAGAGGAGATCT 
CAGMCTAGAGATTGACGTGGATGAGCTCCTGGACATG6AGAGTGACGATGCCTGGGCTT 
[T.C] 

CAGGGTCAAGGAGCTGCTGGTTGACTGTrACAAACCCACAGAGGCCTTCATCTCTGGCCT 
GCTGGACAAGATCCGGGCCATGCAGAAGCTGAGCACACCCCAGAAGAAGTGAGGGTCCCC 
GACCCAGGCGAACGGTGGCTCCCATAGGACAATCGCTACCCCCCGACCTCGTAGCAACAG 
CAATACCGGGGGACCCTGCGGCCAGGCCTGGTTCCATGAGCAGG GCTCCTCGT GCCCCTG 
GCCCAGGGGTCTCTTCCCCTGCCCCCTCAGTnTCCACTTTTGGAI 1 1 1 1 1 IATTGTTAT 

54679 GGCAGTGGCGGCCAAGGACCACGCATCTACTTTCAGAGCCCCCCCCGGGGCCGCAGGAGA 
GGGCCCGGGCTGGGCGGATGATGAGGGCCCAGTGAGGCGCCAAGGGAAGGTCACCATCAA 
GTATGACCCCAAGGAGCTACGGAAGCACCTCAACCTAGAGGAGTGGATCCTGGAGCAGCT 
CACGCGCCTCTACGACTGCCAGGAAGAGGAGATCTCAGAACTAGAGATTGACGTGGATGA 
GCTCCTGGACATGGAGAGTGACGATGCCTGGGCTTCCAGGGTCAAGGAGCTGCTGGTTGA 
CCG] 

TGTTACAAACCCACAGAGGCCTTCATCTCTGGCCTGCTGGACAAGATCCGGGCCATGCAG 
AAGCTGAGCACACCCCAGAAGAAGTGAGGGTCCCCGACCCAGGCGAACGGTGGCTCCCAT 
AGGACAATCGCTACCCCCCGACCTCGTAGCAACAGCAATACCGGGGGACCCTGCGGCCAG 
GCCTGGTTCCATGAGCAGGGCTCCTCGTGCCCCTGGCCCAGGGGTCTCTTCCCC TGCCCC 
CTCAGTTTTCCACTTTTGGATITT^^ 

54693 AGGACCACGCATCTACmCAGAGGCCCCCCCGGGGCCGCAGGAGAGGGCCCGGGCTGGG 
CGGATGATGAGGGCCCAGTGAGGCGCCAAGGGAAGGTCACCATCAAGTATGACCCCAAGG 
AGCTACGGAAGCACCTCAACCTAGAGGAGTGGATCCTGGAGCAGCTCACGCGCCTCTACG 
ACTGCCAGGAAGAGGAGATCTCAGAACTAGAGATTGACGTGGATGAGCTCCTGGACATGG 
AGAGTGACGATGCCTGGGCTTCCAGGGTCAAGGAGCTGCTGGTTGACTGTTACAAACCCA 

[A C] 

AGAGGCCTTCATCTCTGGCCTGCTGGACAAGATCCGGGCCATGCAGAAGCTGAGCACACC 
CCAGAAGAAGTGAGGGTCCCCGACCCAGGCGAACGGTGGCTCCCATAGGACAATCGCTAC 
CCCCCGACCTCGTAGCAACAGCAATACCGGGGGACCCTGCGGCCAGGCCTGGTTCCATGA 
GCAGGGCTCCTCGTGCCCCTGGCCCAGGGGTCTCnCCCCTGCCCCCTCAGTTTTCCACT 
TnGGATTTTTnVVTTCTTAnAAACT^^ 

54706 TACTTTCAGAGCCCCCCCCGGGGCCGCAGGAGAGGGCCCGGGCTGGGCGGATGATGAGGG 
CCCAGTGAGGCGCCAAGGGAAGGTCACCATCAAGTATGACCCCAAGGAGCTACGGAAGCA 
CCTCMCCTAGAGGAGTGGATCCTGGAGCAGCTCACGCGCCTCTACGAGTGCCAGGAAGA 
GGAGATCTCAGAACTAGAGATTGACGTGGATGAGCTCeTGGACATGGAGAGTGACGATGC 
CTGGGCnCCAGGGTCMGGAGCTGCTGGnGACTGTTA(y^CCCACAGAGGCCTTCAT 

[T.C] 

TCTGGCCTGCTGGACAAGATCCGGGCCATGCAGAAGCTGAGCACACCCCAGAAGAAGTGA 
GGGTCCCCGACCCAGGCGAACGGTGGCTCCCATAGGACAATCGCTACCCCCCGACCTCGT 
AGCAACAGCAATACCGGGGGACCCTGCGGCCAGGCCTGGTTCCATGAGCAGG GCTCCTCG 
TGCCCCTGO:CCAG(^TCTCTTCCCC TGCCCC CTCA(3TmCCACTTTTGGAI MINI 
AnGTTAnAAACTGATGGGACTTTG^^ 
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54712 CAGAGCCCCCCCCGGGGCCGCAGGAGAGaGCCCGGGCTGGGCGGATGATGAGGGCCCAGT 
GAGGWCCAAGGGMGGTCACCATCMGTATGACCCCAAGGAGCTACGGMGCACCTCA^ 

CTCAGAACTAG^ 
ScCAGGG^^^ 

CTGCTGGACAAGATCCGGGCCATGCAGAAGCTGAG(^CACCCCAGMGMGTGAGGGTCC 
CCGAC^GKGMCGGTGGCTCCCATAGGACAATCGCTACCCCCCGACCTCGTAGCAAC 

AGCMTAC^GGG^^ 

54799 gggs^^ 

&TCCTG^CATO 

GMGCTGAGCACACCCWGMGMGTGAGGGTCCCCGACCCAGGCGAACGGTGGCTCCCA 
AG^CAATCGCTACCCCCCGACCTCGTAGCAACAGCAATACCGGGGGACCCTGCGGCCAG 

ATATOACTCTGCGGCACGGGCCCTTTMTAAAGCGAGGT 



54819 



CTCAAAAAAAAAAAAAAAAATGAmCCAGCGGTCCACATTAGAGTTGAAAl 
GfiAAGCACCTCAACCT^^ 

ACGATCCCTGGKTTTC 
AGMCT^GGGTCCCCGACCCAGGCGAACGGTGGCTCCCATAGGACAATCGCTACCCCCC 

ArCTCGTAGCAACAGCMTACCGGGGGACCCTGCGGCCAGGCCTGGTTCCATGAGCAGGG 

TTTTTTTATTGTTATTAAACTGATGGGiACTnGTGTTTTTATATTGACT 

KCCTTTAOTAMGCG^^ 
TGATTTCCAGCGGTC(^CATTAGACnTGAMTTn 

55499 

AMGAGTCAAGA^^ 
GTCAGAGCAGGCC 

TTCTCCTAnGTTTnM 
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56825 ACTGATGGCTCAAAGGGTGnTGAAAAAGTCAG TGAT GCTCCCCCTTTCTACTCCAGATCCT 

GTCCTTCCTGGAGCMGGnGAGGGAGTAGGTTTTGAAGAGTCCCITAATATGTGGTGGA 

ACAGGCWGGAGTTAGAGAAAGGGCTGGCTTCTGTTTACCTGCTCACTGGCTCTAGCCAG 
CCCAGGGACCACATCMTGTGAGAGGMGCCTCCACCTCATGTTTTCAMC7TAATACTG 
6AGACTGGCTGAGMCTTACGGACAACATCCTTTCTGTCTGAAACAAACAGTCACAAGCA 
[CA] 

AGfiAAGAGGCTGGGGGACTAGAATVGAGGCCCTGCCCTCTAGAAAGCTCAGATCTTGGCTT 
CTGTTACTCATACTCGGGTGGGCTCCnAGTCAGATGCCTAAA^CATTTTGCCTAAAGCT 
CGATGGGTTCTGGAGGACAGnrGTGGCTTGTCACAGGCCTAGAGTCTGAGGGAGGGGAGTG 
GGAGTCTCAGCMTCTCnGGTCTTGGCTTCATGGCAACCACTGCTCACCCTTCAACATG 
CCTGGTTTAGGCAGCAGCnGGGCTGGGMGAGGTGGTGGCAGAGTCJGAAAGCT^ 

58871 CGTCACCCACCACCCMCCCCTGCCGCACTCCAGCCmMCAAGGGCTGT^ 

CATTTTMCTACCTCCACCnGGAAACMnGCTGMGGGGAGAGGATTTGCAATGACCA 
ACCACCTTGTTGGGACGCCTGCACACCTGTCTTTCCTGCnCAACCTGAAAGATTCCTGA 
TGAT6ATMTCTGGACACAGMGCCGGGCACGGTGGCTCTAGCCTGTAATCTCAGCACTT 
TGGGAGGCCTCAGCAGGTGGATCACCTGAGATCMGAGTrrGAGAACAGCCTGACCAACA 
[T.A] 

GGTGAMCCCCGTCTCTACTAAAAATACAAAAAnAGCCAGGTGTGGJGGCACATACCTG 
TAATCCCAGCTACTCTGGAGGCTGAGGCAGGAGAATCGCTTGAACCCACAAGGCAGAGGT 
TGCAGTGAGGCGAGATCATGCCATTGCACTCCAGCCTGTGCAACAAGAGCCAAACTCCAT 
CTCAAAAAAAAAAA 

FIG.3-36 
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„ AI ._™ „ IIMAN kjhasv PROTEINS. are characteristic of that subdomain and are highly con- 

ISOLATED HUM^ MN^E FKO^ir«, served (Hardie, G. and Hanks, S. (1995) The Protein Kinase 

NUCLEIC ACID MOLECULES ENCODING ^/^fa.Vol 1:7-20 Academic Pre4 San Diego, Calif.). 

HUMAN KINASE PROTEINS, AND USES ^ prima . 

THEREOF ^ fiiy mediate the effects of second messengers such as cyclic 

npin nv THF INVENTION AMP (cAMP), cycUc GMP, inositol triphosphate, 

HELD OF THE INVEN ilUis . pbosphatidylinositol, 3,4,5-triphosphate, cyclic-ADPribose, 

The present invention is in the field of kinase proteins that jrachidonic acid, diacylglycerol and calcium-calmodulin. 

are related to the serine/threonine kinase subfamily, recom- The cyclic-AMP dependent protein kinases (PKA) are 

binant DNA molecules, and protein production. The present important members of the STK family. Cyclic-AMP is an 

invention specifically provides novel peptides and proteins intracellular mediator of hormone action m aU piokaryotic 

that effect protein phosphorylation and nucleic acid mol- an d animal cells that have been studied Such hormone- 

ecules encoding such peptide and protein molecules, all of induced cellular responses include thyroid hormone 

which are useful in the development of human therapeutics secretion, Cortisol secretion, progesterone secretion, glyco- 

and diagnostic compositions and methods. 15 gen breakdown, bone resorption, and regulation of heart rate 

8 v 1 and force of heart muscle contraction. PKA is found in all 

BACKGROUND OF THE INVENTION animal cells and is thought to account for the effects of 

cyclic-AMP in most of these cells. Altered PKA expression 

Protein Kinases ^ implicated in a variety of disorders and diseases including 

Reversible orotein ohosohorvlation is the main strategy for ^ also members of STK family. Calmodulin is a caicium 

^SSiSSSSSSS^^^^^^ receptormatmediatesmanycaldumre^atedpr^by 

mo« mS f 1000 of the 10,000 proteins active in a typical binding to target proteins in response to the binding of 

mammal «11 are phosphorated. The high energy calcium. Tne principle target protem m these processes* 

nhofnhaie which drivei activation, is generaUy transferred CaM dependent protein kinases. CaM-kmases^ involved 
&£^tnpho^hatemo 30 * regulation ol ismooth .muscle ^UacUon ^C bnas'), 

^tetaSprotemtoa^ glycogen ^^^"^JSA^ CaM 

orotein phosphatases. Phosphorylation occurs in response to retransmission (CaM kinase I and CaM kinase IQ. qaM 

eSllSS^(hornTonei neurotransmitters, growth kinase I phosphory ^/^f^ of ^f'^"^! 

.^^Tntiatinn factors etc) cell cycle checkpoints, and neurotransmitter related proteins synapsin 1 and II, the gene 

superfamily of enzymes with ^ety vane dtaM^ SrgtoC^.Tuek^ 

Et»Ett — Itf- be ph^phorylated by another kin*e as part of . "kinase 

phdrylate tyLLe residues (protein tyrosine kinases, PTK) activated ' «g22J2^ A2JS£ 
dual specificity and I pbosphorylate *™^gg» ^i^^S^^^ tt^ S mediates 

S ^^1a™?C term^^^^^^ «* «» *™ nonitalytic beta and gamma subunits that are 

to£fa Se^nm ' pS^ate from ATP to the unite of AMPK have a much wider 

tne transier oi me gaium y v tyrosine residue. lipogen c tissues such as brain, heart, spleen, and lung than 
hydroxyl group of » ^ *Mune, or T™' expected. This distribution suggests that its role may extend 

Subdomain V spans the two lobes. £P meUboJism ^ 

o£ the kinase domain. Tnese «d Jed amino »^gae^ ^« ^ £ ^^ce J, me nuc l euS via phosphory- 

allow the regulation of each kinase as it "JJ ution cascades. Several subgroups have been identified, and 

interactewiuitotarget P rotem.Ttepr^a^s^ different sub^te specificities and responds 

kinase domains is unserved and can ^JdotSSS to distinct extraceUular stimuli (Egan, S. E. and Weinberg, 

into 11 subdomatns. Each of the 11 subdomains contaws » 365:781-783). MAP kinase signaling 

specific residues and motifs or patterns of amino acids that R. A. (lvw; nature joj. /oi *~ 
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pathways are present in mammalian cells as well as in yeast. FIG. 2). LIMK proteins generally have serine/threonine 
The extracellular stimuli that activate mammalian pathways kinase activity. The protein of the present invention may be 
include epidermal growth factor (EGF), ultraviolet light, a novel alternative splice form of the art-known protein 
hyperosmolar medium, heat shock, endotoxic lipopolysac- provided in Genbank gi805161 ; however, the structure of 
charide (LPS), and pro-inflammatory cytokines such as 5 me gene provided by the present invention is different from 
tumor necrosis factor (TNF) and interleukin-1 (IL-1). the art-known gene of gi8051618 and the first exon of the 

PRK (proliferation-related kinase) is a serum/cytokine gene -of the present invention is novel, suggesting a novel 
inducible STIC that is involved in regulation of the cell cycle g^e rather than an alternative splice form. Furthermore, the 
andceUproliferaaoninhumanmegakaroyUcceUs(Li,B.et P rotcin of thc P resent invention lacks an JJM domain 
al. (1996) J. Biol Chem. 271:19402-^8). PRK is related to 30 relative to gi8051618. The protein of the present invention 
the polo (derived from humans polo gene) family of STKs does contain the kinase catalytic domain, 
implicated in cell division. PRK is downregulated in lung Approximately 40 LJM proteins, named for the LIM 
tumor tissue and may be a proto-oncogene whose deregu- domains they contain, are known to exist in eukaryotes. JJM 
lated expression in normal tissue leads to oncogenic trans- domains are conserved, cystein-rich structures that contain 2 
formation. Altered MAP kinase expression is implicated in 15 zinc fingers that are thought to modulate protein-protein 
a variety of disease conditions including cancer, interactions. LIMK1 and LIMK2 are members of a UM 
inflammation, immune disorders, and disorders affecting subfamily characterized by 2 N-tenninal LIM domains and 
growth and development. a C-terminal protein kinase domain. LIMK1 and LIMK2 

The cyclin-dependent protein kinases (CDKs) are another *RNA expression varies greatly between different tissues, 
group of STKs that control the progression of cells through 20 ^ P mic ^ ^ase domains of LIMK1 and LIMK2 contain 
the cell cycle. Cyclins are small regulatory proteins that act a unique sequence motif comprising Asp-Leu-Asn-Ser-His- 
by binding to and activating CDKs that then trigger various Asn in subdomain VIB and a strongly basic insert between 
phases of the cell cycle by phosphorylating and activating subdomains VII and VDI (Okano et al., J. Biol Chem. 270 
selected proteins involved in the mitotic process. CDKs are (52), 31321-31330 (1995)). The protein kinase domain 
unique in that they require multiple inputs to become 25 present in IIMKs is significantly different than other kinase 
activated. In addition to the binding of cyclin, CDK activa- domains, sharing about 32% identity, 
tion requires the phosphorylation of a specific threonine LIMK is activated by ROCK (a downstream effector of 
residue and the dephosphorylation of a specific tyrosine Rho) via phosphorylation. LIMK then phosphorylates 
residue. cofilin, which inhibits its actin-depolymerizing activity, 

Protein tyrosine kinases, PTKs, specifically phosphory- 30 thereby leading to Rho-induced reorganization of the actin 
late tyrosine residues on their target proteins and may be cytoskeleton (Maekawa et al., Science 285; 895^898, 1999). 
divided into transmembrane, receptor PTKs and The LIMK2a and LIMK2b alternative transcript forms are 
nontransmembrane, non-receptor PTKs. Transmembrane differentially expressed in a tissue-specific manner and are 
protein-tyrosine kinases are receptors for most growth fac- 3S generated by variation in transcriptional initiation utilizing 
tors. Binding of growth factor to the receptor activates the alternative promoters. LIMK2a contains 2 LIM domains, a 
transfer of a phosphate group from ATP to selected tyrosine PDZ domain (a domain that functions in protein-protein 
side chains of the receptor and other specific proteins. interactions targeting the protein to the submembranous 
Growth factors (GF) associated with receptor PTKs include; compartment), and a kinase domain; whereas UMK2b just 
epidermal GF, platelet-derived GF, fibroblast GF, hepatocyte ^ has 1.5 LIM domains. Alteration of LIMK2a and LIMK2b 
GF, insulin and insulin-like GFs, nerve GF, vascular endot- regulation has been observed in some cancer cell lines 
helial GF, and macrophage colony stimulating factor. (Osada et al., Biochem. Biophys. Res. Commun. 229: 

Non-receptor PTKs lack transmembrane regions and, 582-589, 1996). 
instead, form complexes with the intracellular regions of cell For a further review of LIMK proteins, see Nomoto et at, 
surface receptors. Such receptors that function through non- 45 Gene 236 (2), 259-271 (1999). 

receptor PTKs include those for cytokines, hormones Kinase proteins, particularly members of thc serine/ 
(growth hormone and prolactin) and antigen-specific recep- threonine kinase subfamily, are a major target for drug 
tors on T and B lymphocytes. action and development. Accordingly, it is valuable to the 

Many of these PTKs were first identified as the products field of pharmaceutical development to identify and ctiar- 
of mutant oncogenes in cancer cells where their activation so acterize previously unknown members of this subfamily of 
was no longer subject to normal cellular controls. In fact, kinase proteins. The present invention advances the state of 
about one third of the known oncogenes encode PTKs, and the art by providing previously. unidentified human kinase 
it is well known that cellular transformation (oncogenesis) is proteins that have homology to members of the serine/ 
often accompanied by increased tyrosine phosphorylation threonine kinase subfamily, 
activity (Carbonneau H and Tonks NK (1992) Amu. Rev. 55 onin^nvArn^r .vn^vrr, M , 

Cell Bhl 8:463-93). Regulation of FTK activity may SUMMARY OF THE INVENTION 

therefore be an important strategy in controlling some types The present invention is based in part on the identification 
of cancer. of amino acid sequences of human kinase peptides and 

proteins that are related to the serine/threonine kinase 
UM Domain Kinases 60 su bfamiiy ( as well as allelic variants and other mammalian 

The novel human protein, and encoding gene, provided by orthologs thereof. These unique peptide sequences, and 
the present invention is related to the family of serine/ nucleic acid sequences that encode these peptides, can be 
threonine kinases in general, particularly LIM domain used as models for the development of human therapeutic 
kinases (LIMK), and shows the highest degree of similarity targets, aid in the identification of therapeutic proteins, and 
to LIMK2, and the UMK2b isoforn (Genbank gi805i618) 65 serve as targets for the development of human therapeutic 
in particular (see the amino acid sequence alignment of the agents that modulate kinase activity in cells and tissues that 
protein of the present invention against UMK2b provided in express the kinase. Experimental data as provided in FIG. 1 
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indicates expression in humans in teratocarcinoma, ovary, members of this family of proteins and proteins that have 

testis, nervous tissue, bladder, infant and fetal brain, and expression patterns similar to that of the present gene. Some 

thyroid gland. of ^ e more sp^^ 0 features of the peptides of the present 

invention, and the uses thereof, are described herein, par- 

DESCRIPTION OF THE FIGURE SHEETS 5 ticularly in the Background of the Invention and in the 

' - : f tSxta annotation provided in the Figures, and/or are known within 

FIG. 1 provides the nucleotide sequence of a cDNA . ^ forcachof the known serine/uireonine kinase family 

molecule that encodes the kinase protein of the present a ^ ^ 

invention. (SEQ ID NO:l) In addition, structure and fiinc- ' * 

tional information is provided, such as ATG start, stop and Specific Embodiments 
tissue distribution, where available, that allows one to 

readily determine specific uses of inventions based on this Peptide Molecules 

molecular sequence. Experimental data "P"vid«l '"^ ^ present invention provides nucleic acid sequences 

1 indicates express™ m humans m t ^^" 0 " a '. ov ^ tot en ^ e protein mo lec^les that have been identified as 

teste nervous tissue, bladder, infant and feUl brain, and ^ ^JF^ q{ ^ ^ femfly ^ ^ ^ 

thyroid gland. . .. r u related to the serine/threonine kinase subfamily (protein 

FIG. 2 provides the predicted ammo acid sequence of the s ces are prov ided in FIG. 2, transcript/cDNA 
kinase of the present invention. (SEQ ID NO:2) In addition ^ are providcd m piG. 1 and genomic sequences are 

structure and functional information such as protein family, rovMcd in na 3 ). Th e peptide sequences provided in FIG. 
function, and modification sites is provided where available, 2Q 2 ^ weU as the obvious va riants described herein, particu- 
allowing one to readily determine specific uses of inventions ^ aUeUc variants ^ identificd here in and using the 
based on this molecular sequence. information in FIG. 3, will be referred herein as the kinase 

FIG. 3 provides genomic sequences that span the gene peptides of the present invention, kinase peptides, or 
encoding the kinase protein of the present invention. (SEQ peptides^proteins of the present invention. 
ID NO:3) In addition structure and functional information, „ ^ invention provides i^ted peptide and pro- 

such as intron/exon structure, promoter locaUon, etc., is ^ molecules ^at consist of, consist essentially of, or 
provided where available, allowing one to readily determine ^ • ^ amino acid sequences of the kinase peptides 
specific uses of inventions based on this molecular disclosed ^ me mGm 2 , (encoded by the nucleic acid 
sequence. As illustrated in FIG. 3, SNPs were identified at moIecule shown ^ mG Xj transcript/cDNA or FIG. 3, 
42 different nucleotide positions. 30 gcnom i c sequence), as well as all obvious variants of these 

DETAILED DESCRIPTION OF THE PfPtides that are withinjhe art to make and use. Some of 

^^yg^jlQpi these variants are described in detail below. 

As used herein, a peptide is said to be "isolated" or 

General Description "purified" when it is substantially free of cellular material or 

■ , . c free of chemical precursors or other chemicals. The peptides 

The present invention is based on the sequencing of the q{ t invention caQ ^ rified t0 homogene ity or 

human genome. During the sequencing and assembly of the ^ £ ^ The leyel of purification ^1 be 

human genome, analysis of the sequence formation ^ ^ ^ featufe fe ^ ^ 

revealed previously unidentified fragments of the human ioQ aUows for me dcsired of me peptide> 

genome that encode peptides that share structural anoVor 40 VJ? tf ^ me n<x of 3^^^ amounls of oihtr 

sequence homology to protein/pepude/domains identified (thc features of an isolated nucleic acid mol- 

and characterized within the art as being a kinase protein or ^ ^ disciss^j below). 

£le^ . * "™ ^->^ t ? h - .^^ 

. . . ™ .LmKW nn/i trfln«!rint « includes preparations of the peptide having less than about 

X^J^^i5^£2SS 45 30% to'dj weigh,) othe/p'ro.eins (i.e contaminating 
acid sequences of human kinase peptides and proteins that 10% other proteins, or less than about 5% other protein* 

S^rsSn^ 

kinase peptides and proteins, nucleic acid variation (allelic preparation. . -. 

information), tissue distribution of egression, and informa- The language "substantially free of chemical precursors 

tion about the closest art known protefo/peptide/domain that or other chemicals" includes preparations of the peptide in 
has structural or sequence homology to the kinase of the 55 which it is separated from chemical precursors or other 

present invention chemicals that are involved in ite synthesis. In one 

InaMuontobemgprevio^lyunknown.thepeptidesthat embodiment, the language ^jn^y^oTBh^ 

are provided in the present invention are selected based on Pfcursors or ofcer chemical mclud^aratonsof the 

*J^^f^!^gS^ir^ 60 Scal^^^ 

^rrsetLt^om^ 

rcUtedness to known kinase proteins of tt.e serine/threonine chemjcal precursors or other chemicals, or less than about 

klSTsTbfamily and the expression pattern observed. « .chemical precursors "f™**™^.^^^ 

Experimental data as provided in FIG. 1 indicates expres- The isolated kinase peptide can be purified from cells that 
sion in humans in teratocarcinoma, ovary; testis, nervous 65 naturally express it, purified from cells that have been 

tissue, bladder, infant and fetal brain, and thyroid gland. The altered to express it (recombinant), or synthesized using 

art has clearly established the commercial importance of known protein synthesis methods. Experimental data as 
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provided in FIG. 1 indicates expression in humans in A chimeric or fusion protein can be produced by standard 
teratocarcinorna, ovary, testis, nervous tissue, bladder, infant recombinant DNA techniques. For example, DNA fragments 
and fetal brain, and thyroid gland. For example, a nucleic coding for the different protein sequences are ligated 
acid molecule encoding the kinase peptide is cloned into an together in-frame in accordance with conventional tech- 
expression vector, the expression vector introduced into a 5 niques. In another embodiment, the fusion gene can be 
host cell and the protein expressed in the host cell. The synthesized by conventional techniques including auto- 
protein can then be isolated from the cells by an appropriate mated DNA synthesizers. Alternatively, PCR amplification 
purification scheme using standard protein purification tech- of gene fragments can be carried out using anchor primers 
niques. Many of these techniques are described in detail which give rise to complementary overhangs between two 
below. 10 consecutive gene fragments which can subsequently be 

Accordingly, the present invention provides proteins that annealed and re-amplified to generate a chimeric gene 

consist of the amino acid sequences provided in FIG. 2 (SEQ sequence (see Ausubel et al, Current Protocols in Molecu- 

ID NO:2), for example, proteins encoded by the transcript/ lar Biology, 1992). Moreover, many expression vectors are 

cDNA nucleic acid sequences shown in FIG. 1 (SEQ ID commercially available that already encode a fusion moiety 

NO:l) and the genomic sequences provided in FIG. 3 (SEQ 15 (e.g., a GST protein). A kinase pepu'de-encoding nucleic 

ID NO:3). The amino acid sequence of such a protein is acid can be cloned into such an expression vector such that 

provided in FIG. 2. A protein consists of an amino acid the fusion moiety is linked in-frame to the kinase peptide, 

sequence when the amino acid sequence is the final amino As mentioned above, the present invention also provides 

acid sequence of the protein. and enables obvious variants of the amino acid sequence of 

The present invention further provides proteins that con- 20 the proteins of the present invention, such as naturally 

sist essentially of the amino acid sequences provided in FIG. occurring mature forms of the peptide, allelic/sequence 

2 (SEQ ID NO:2), for example, proteins encoded by- the variants of the peptides, non-naturally occurring recombi- 
transcript/cDNA nucleic acid sequences shown in FIG. 1 nandy derived variants of the peptides, and orthologs and 
(SEQ ID NO:l) and the genomic sequences provided in FIG. paralogs of the peptides. Such variants can readily be 

3 (SEQ ID NO:3). A protein consists essentially of an amino 25 generated using art-known techniques in the fields of recom- 
acid sequence when such an amino acid sequence is present binant nucleic acid technology and protein biochemistry. It 
with only a few additional amino acid residues, for example is understood, however, that variants exclude any amino acid 
from about 1 to about 100 or so additional residues, typically sequences disclosed prior to the invention. 

from 1 to about 20 additional residues in the final protein. Such variants can readily be identified/made using 
The present invention further provides proteins that com- 30 molecular techniques and the sequence information dis- 
pose the amino acid sequences provided in FIG. 2 (SEQ ID closed herein. Further, such variants can readily be distin- 
•NO:2), for example, proteins encoded by the transcript/ guished from other peptides based on sequence and/or 
cDNA nucleic acid sequences shown in FIG. 1 (SEQ ID structural homology to the kinase peptides of the present 
NO:l) and the genomic sequences provided in FIG. 3 (SEQ invention. The degree of homology/identity present will be 
ID NO:3). A protein comprises an amino acid sequence 35 based primarily on whether the peptide is a functional 
when the amino acid sequence is at least part of the final variant or non-functional variant, the amount of divergence 
amino acid sequence of the protein. In such a fashion, the present in the paralog family and the evolutionary distance 
protein can be only the peptide or have additional amino acid between the orthologs. 

molecules, such as amino acid residues (contiguous encoded To determine the percent identity of two amino acid 

sequence) that are naturally associated with it or heterolo- 40 sequences or two nucleic acid sequences, the sequences are 

gous amino acid residues/peptide sequences. Such a protein aligned for optimal comparison purposes (e.g., gaps can be 

can have a few additional amino acid residues or can introduced in one or both of a first and a second amino acid 

comprise several hundred or more additional amino acids. or nucleic acid sequence for optimal alignment and non- 

The preferred classes of proteins that are comprised of the homologous sequences can be disregarded for comparison 

kinase peptides of the present invention are the naturally 45 purposes). In a preferred embodiment, at least 30%, 40%, 

occurring mature proteins. A brief description of how vari- 50%, 60%, 70%, 80%, or 90% or more of the length of a 

ous types of these proteins can be made/isolated is provided reference sequence is aligned for comparison purposes. The 

below. amino acid residues or nucleotides at corresponding amino 

Hie kinase peptides of the present invention can be acid positions or nucleotide positions are then compared, 

attached to heterologous sequences to form chimeric or 50 When a position in the first sequence is occupied by ,|he 

fusion proteins. Such chimeric and fusion proteins comprise same amino acid residue or nucleotide as the corresponding 

a kinase peptide operatively linked to a heterologous protein position in the second sequence, then the molecules are 

having an amino acid sequence not substantially homolo- identical at that position (as used herein amino acid or 

gous to the kinase peptide. "Operatively linked" indicates nucleic acid "identity" is equivalent to amino acid or nucleic 

that the kinase peptide and the heterologous protein are 55 acid "homology**)' The percent identity between the two 

fused in-frame. The heterologous protein can be fused to the sequences is a function of the number of identical positions 

N-tcrminus or C-terminus of the kinase peptide. shared by the sequences, taking into account the number of 

In some uses, the fusion protein does not affect the gaps, and the length of each gap, which need to be intro- 

activity of the kinase peptide per se. For example, the fusion duced for optimal alignment of the two sequences, 

protein can include, but is not limited to, enzymatic fusion 60 The comparison of sequences and determination of per- 

proteins, for example beta-galactbsidase fusions, yeast two- cent identity and similarity between two sequences can be 

hybrid GAL fusions, poly-His fusions, MYC-tagged, accomplished using a mathematical algorithm. 

Hi-tagged and Ig fusions. Such fusion proteins, particularly (Computational Molecular Biology, Lesk, A. M., ed., 

poly-His fusions, can facilitate the purification of recombi- Oxford University Press, New York, 1988; Biooomputing: 

nant kinase peptide. In certain host cells (e.g., mammalian 65 Informatics and Genome Projects, Smith, D. W., ed., Aca- 

host cells), expression and/or secretion of a protein can be demic Press, New York, 1993; Computer Analysis of 

increased by using a heterologous signal sequence. Sequence Data, Part 1, Griffin, A. M., and Griffin, H. G., 
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eds., Humana Press, New Jersey, 1994; Sequence Analysis the proteins) have significant homology when the^ ammo 

in Molecular Biology, von Heinje, G., Academic Press, acid sequences are typically at least about 70-80%, 80-90%, 

1987; and Sequence Analysis Primer, Gribskov, M. and and more typically at least about 90-95% or more homolo- 

Devereux, J., eds., M Stockton Press, New York, 1991). In gous. A significantly homologous amino acid sequence, 

a preferred embodiment, the percent identity between two 5 according to the present invention, will be encoded by a 

amino acid sequences is determined using die Needleman nucleic acid sequence that will hybridize to a kinase peptide 

and Wunsch (/. Mol. Biol (48):444-453 (1970)) algorithm encoding nucleic acid molecule under stringent conditions 

which has been incorporated into the GAP program in the as more fy^y described below. 

GCG software package (available at http://www.gcg.com), pj G 3 provides information on SNPs that have been 

using either a Blossom 62 matrix or a PAM250 matrix, and 10 found m ^ geQe enco dj ng me kinase protein of the present 

a gap weight of 16, 14, 12, 10, 8, 6, or 4 and a length weight ^0^. SNPs were identified at 42 different nucleotide 

of 1, 2, 3, 4, 5, or 6. In yet another preferred embodiment, positions. Some of these SNPs, which are located outside the 

the percent identity between two nucleotide sequences is QRF an d in introns, may affect gene transcription, 

determined using the GAP program in the GCG ^fhnue can ^ i<Jentified 

package (Devereux, J eta., Nucleic Acuis Res. 11^1 1S * significant sequence homology/ 

(1984))(avaflab e at ^ identify to at le as! a portion of the kinasl peptide, as being 

gapdna.CMP matrix and a gap weight of 40 50, 60 ,70 ^o ^ * fa ^ ^ 

80 and a length weight of 1 2, 3, 4, 5 or 6. In another Junction. Two proteins will typically be consid- 

embodiment, ^^f^^^ t ^3^ f ^ ered paralogs when the amino acid sequences are typically 

nucleoude^ 20 at lea P st 60% or ^ and morc typically at kast 

Myers and W. Mdler (CABIOS ^^^^^ about 70% or greater homology through a given region or 

been incorporated into the ALIGN program (version Z0) * ^ be a add 

using a PAM120 weight residue table, a gap length penalty ^ ^ ^ fa tQ & ^ ^ e encoding 

of 12 and a gap penalty of 4. nucleic add molecule mo derate to stringent condi- 

The nucleic acid and protein sequences of the present 23 tions ^ more m described below. 

%S$ other family members or related sequences. Such htvfag some degree of sign* can sequence homdogy/ 

searchl can be performed using the NB LAST and **>V least a po rtiono fthekuiase pepUd, aswcllas 

XBLASTprogramsCversiona^ofAltschuUtaLC/.A/oi 30 ^,^ATiTfSf3^ 

Bid/ 215:403-10 (1990)). BLAST nucleotide searches can orthologs will be isolated from mammals, preferably 

be perfo med with the NBLAST program, score-100. primates, for the deve opment of human toerapeutic targete 

wJ£n£h*12 to obtain nucleotide sequences homologous ^'^f^^^^*?^*^^- 

to the nucleic acid molecules of A? invention. BLAST sequence that will hybndize to a kma* . peptide encodmg 

protein searches can ^^^J™ " Ztt^^ttZgSZ 

S^KSi^S 1 !^ 0?^^ degree of rela te dness of the organisms yielding ,he 

obtain gapped alignments for comparison purposes, Gapped protems. .... . 

BLAST can be utilized as described in Altschul et al. Non-naturally occurring vanants of the kinase peptides of 

(Nucleic Acids Res. 25(17):3389-3402 (1997)). When uti- m the present invention can readily be generated using recom- 

lizing BLAST and gapped BLAST programs, the default binant techniques. Such variants include, but are not limited 

parameters of the respective programs (e.g., XBLAST and to deletions, additions and substitutions in the ammo acid 

NBLAST) can be used sequence of the kinase peptide. For example, one class of 

Full-length pre-processed forms, as well as mature pro- substitutions are conserved amino acid substitution. Such 
cessed forms, of proteins that comprise one of the peptides 4 s substitutions are those that substitute a/ve» *mu.o acid in 
of the present invention can readily be identified as having a kinase peptide by another ammo acid of Wee < Aaract ens- 
complete sequence identity to one of the kinase peptides of tics Typically seen as conservative substituuons are the 
the presentation as well as being encoded by the same repUcements, one for another among tte ^tticuuno 
genetic locus as the kinase peptide provided herein. The acids Ala, VaL Leu, and lie; interchange of the hydroxyl 
lene encoding the novel kinase protein of the present 50 n^S«^^.«tt^^^VM^*^ 
invention is located on a genome component that has been and 0^*^^*^^*^ residues Asn and. 
mapped to human chromosome 22 (as indicated in FIG. 3), Qto; exchange of the^basic residues Lys and 1 Arg; and 
which is supported by multiple lines of evidence, such as rep ayments among the aromatic residues Phe and Tyr 
STS and BAC map data Guidance concerning which amino acid changes are likely to 

Allelic variants of a kinase peptide can readily be iden- 55 be Ptenotypically sfcnt are found to Bowie et al., 5d^e 

tified as being a human protein having a high degree z*/:ijuo-ijiw\i»»v;- 

(significant) of sequence homology/identity to at least a Variant kinase peptides can be fully functional lor can lack 

portion of the kinase peptide as well as being encoded by the function in one or more activities, e.g. ability to bind 

same genetic locus as the kinase peptide provided herein. substrate, ability to phosphorylate substrate, ability to medi- 

Genetic locus can readily be determined based on the « ate signaling, etc. Fully functional variants typically contain 

genomic information provided in FIG. 3, such as the only conservative variation or variation in non-critical resi- 

genomic sequence mapped to the reference human. The gene dues or in non-critical regions. FIG. 2 provides the result of 

encoding the novel kinase protein of the present invention is protein analysis and can be used to identify critical domains/ 

located on a genome component that has been mapped to regions. Functional variants can also contain substitution of 

human chromosome 22 (as indicated in FIG. 3), which is 65 similar amino acids that result in no change or an insignifi- 

supported by multiple lines of evidence, such as STS and cant change in function. Alternatively, such substitutions 

BAC map data. As used herein, two proteins (or a region of may positively or negatively affect function to some degree. 



.1 



US 6,340,583 Bl 

11 12 

Non-functional variants typically contain one or more teolytic processing, phosphorylation, prenylation, 
non-conservative amino acid substitutions, deletions, racemization, selenoylation, sulfation, transfer-RNA medi- 

insertions, inversions, or truncation or a substitution, ated addition of amino acids to proteins such as arginylation, 
insertion, inversion, or deletion in a critical residue or and ubiquitination. 

. critical region: 5 Such modifications are well known to those of skill in the 

Amino acids that are essential for function can be iden- art and have been described in great detail in the scientific 

tified by methods known in the art, such as site-directed literature. Several particularly common modifications, 

mutagenesis or alanine-scanning mutagenesis (Qinningham glycosylation, lipid attachment, sulfation, gamma- 

et al., Science 244: 1081-1085 (1989)), particularly using the carboxylation of glutamic acid residues, hydroxylation and 

results provided in FIG. 2. The latter procedure introduces 10 ADP-ribosylation, for instance, are described in most basic 

single alanine mutations at every residue in the molecule. texts, such as Proteins— Structure and Molecular 

The resulting mutant molecules are then tested for biological Properties, 2nd Ed., T. E. Creighton, W. H. Freeman and 

activity such as kinase activity or in assays such as an in Company, New York (1993). Many detailed reviews are 

vitro proliferative activity. Sites that are critical for binding available on this subject, such as by Wold, R, Posttransla- 

partner/substrate binding can also be determined by struc- * 5 tional Covalent Modification of Proteins, B. C. Johnson, 

tural analysis such as crystallization, nuclear magnetic reso- Ed., Academic Press, New York 1-12 (1983); Seifter et al. 

nance or photoaffinity labeling (Smith et al., J. MoL Biol (Meth. Enzymol. 182: 626-646 (1990)) and Rattan et al. 

224:899-904 (1992); de \os et al. Science 255:306-312 (Awl NY. Acad. ScL 663:48-62 (1992)). 

(1992)). " Accordingly, the kinase peptides of the present invention 

lhe present invention further provides fragments of the 20 also encompass derivatives or analogs in which a substituted 

kinase peptides, in addition to proteins and peptides that amino acid residue is not one encoded by the genetic code, 

comprise and consist of such fragments, particularly those in which a substituent group is included, in which the mature 

comprising the residues identified in FIG. 2. The fragments kinase peptide is fused with another compound, such as a 

to which the invention pertains, however, are not to be compound to increase the half-life of the kinase peptide (for 

construed as encompassing fragments that may be disclosed 25 example, polyethylene glycol), or in which the additional 

publicly prior to the present invention. amino acids are fused to the mature kinase peptide, such as 

As used herein, a fragment comprises at least 8, 10, 12, * leader or secretorf sequence or a sequence for purification 

14, 16, or more contiguous amino acid residues from a of the mature kinase peptide or a pro-protein sequence, 

kinase peptide. Such fragments can be chosen based on the 30 Protein/Peptide Uses 

ability to retain one or more of the biological activities of the * 

kinase peptide or could be chosen for the ability to perform The proteins of the present invention can be used in 

a function, e.g. bind a substrate or act as an immunogen. substantial and specific assays related to the functional 

Particularly important fragments are biologically active information provided in the Figures; to raise antibodies or to 

fragments, peptides that are, for example, about 8 or more 3J elicit another immune response; as a reagent (including the 

amino acids in length. Such fragments will typically com- labeled reagent) in assays designed to quantitatively deter- 

prise a domain or motif of the kinase peptide, e.g., active mine levels of the protein (or its binding partner or ligand) 

site, a transmembrane domain or a substrate-binding in biological fluids; and as markers for tissues in which the 

domain. Further, possible fragments include, but are not corresponding protein is preferentially expressed (either 

limited to, domain or motif containing fragments, soluble ^ constitutively or at a particular stage of tissue differentiation 

peptide fragments, and fragments containing immunogenic or development or in a disease state). Where the protein 

structures. Predicted domains and functional sites are readily binds or potentially binds to another protein or ligand (such 

identifiable by computer programs well known and readily as, for example, in a kinase-effector protein interaction or 

available to those of skill in the art (e.g., PROSITE analysis). kinase-ligand interaction), the protein can be used to identify 

The results of one such analysis are provided in FIG. 2. 45 the binding partner/ligand so as to develop a system to 

Polypeptides often contain amino acids other than the 20 identify inhibitors of the binding interaction. Any or all of 

amino acids commonly referred to as the 20 naturally these uses are capable of being developed into reagent grade 

occurring amino acids. Further, many amino acids, including or kit format for commercialization as commercial products, 

the terminal amino acids, may be modified by natural Methods for performing the uses listed above are well 

processes, such as processing and other post-translational 50 known to those skilled in the art. References disclosing such 

modifications, or by chemical modification techniques well methods include "Molecular Cloning: A Laboratory 

known in the art. Common modifications that occur natu- Manual", 2d ed., Cold Spring Harbor Laboratory Press, 

rally in kinase peptides are described in basic texts, detailed Sambrook, J., E. F. Fritsch and t. Maniatis cds., 1989, and 

monographs, and the research literature, and they are well "Methods in Enzymology: Guide to Molecular Cloning 

known to those of skill in the art (some of these features are 55 Techniques", Academic Press, Berger, S. L. and A R. 

identified in FIG. 2). Kimmel eds., 1987. 

Known modifications include, but are not limited to, The potential uses of the peptides of the present invention 

acetylation, acylation, ADP-ribosylation, amidation, cova- are based primarily on the source of the protein as well as the 

lent attachment of flavin, covalent attachment of a heme class/action of the protein. For example, kinases isolated 

moiety, covalent attachment of a nucleotide or nucleotide 60 from humans and their human/mammalian orthblogs serve 

derivative, covalent attachment of a lipid or lipid derivative, as targets , for identifying agents for use in mammalian 

covalent attachment of phosphotidylinositol, cross-linking, therapeutic applications, e.g. a human drug, particularly in 

cyclization, disulfide bond formation, demethylation, for- modulating a biological or pathological response in a cell or 

mation of covalent crosslinks, formation of cystine, forma- tissue that expresses the kinase. Experimental data as pro- 

tion of pyroghtamate, formylation, gamma carboxylation, 65 vided in FIG. 1 indicates that the kinase proteins of the 

glycosylation, GPI anchor formation, hydroxylation, present invention are expressed in humans in 

iodination, methylation, myristoylation, oxidation, pro- teratocarcinoma, ovary, testis, nervous tissue, bladder, infant 
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brain, and Ihyroid gland, as indicated by virtual northern blot transduction such as protein phosphorylation, cAMP 

analysis. In addition, PCR-based tissue screening panels turnover, and adenylate cyclase activation, etc. 

indicate expression in fetal brain. A large percentage of Candidate compounds include, for example, 1) peptides 

pharmaceutical agents are being developed that modulate sucn 33 so^ie peptides, including Ig-tailed fusion peptides 

the activity of kinase proteins, particularly members of the 5 and mem b ers 0 f random peptide libraries (see, e.g., Lam et 

serine/threonine kinase subfamily (see Background of the ^ Nature 354 : 82-*4 (1991); Houghten et aL, Nature 

Invention), the structural and functional information pro- 354.34.86 n oon) and combinatorial chemistry-derived 

vided in the Background and Figures provide specific and molecular ubntics m ade of D . and / 0 r L-configuration 

substantial uses for the molecules of the present invention, imho a<; . „ hos ^ hopeptides (e g i memb ers of random 

particularly in combination with the expression mformation 10 ^ . dege nerate, directed phosphopeptide libraries, 

provided in FIG. 1. Experimental data as provided m FIG. Songyang et aL, Cell 72:767-778 (1993)); 3) 

1 indicates expression in humans in teratocaranoma, ovary, antibodies (e - polyclonal, monoclonal, humanized, anti- 

testis, nervous tissue, bladder, infant and fetal brain, and chimeric> and sing , e ch ai n antibodies as well as 

thyroid gland. Such uses can really be determined using the ^ ressioQ libfa fragmenlS| m<i ep j t0 p e . 

information provided herein, that which is known in the art, , s j^^ts 0 f antibodies); and 4) small organic and 

and routine experimentation. inorganic molecules (e.g., molecules obtained from combi- 

Tne proteins of the present invention (including variants natorial ^ natufal proiuct libraries), 

and fragments that may have been d^losed^ prior -to. to ate fa of ^ 

present invention) are useful for £iolog.cal competes for substrate binding. Other candi- 

kmases that are related to ? f » dat/c^pounds include mutant kinases or appropriate frag- 

ktoase subfamily. Such assays ^.•^f^™ men ts containing mutations mat affect kinase" function and 

kmase functions or actmt.es or lf*f"^ J** * mus compete for substrate. Accordingly, a fragment mat 

nosis and treatment of tanase-related cookuou (that are ^ for example wilh a higher affinity, or 

specific for the subfamily of kmases that the one of the .^^^^j^^,^,,,,,.^ release, is 

present invention belongs U>, particularly in cells and issues „ ^ d by to invention, 

that express the kinase. Experimental data as provided in . ' . • 

FIG. 1 indicates that the kmase proteins of the present The invention further mcludes other end point assays to 

invention are expressed in humans in teratocarcinoma, dentify compounds that modukte (stimulate or inhibiO 

ovary, testis, nervous tissue, bladder, infant brain, and thy- kinase activity. The assays typically involve an assay of 

roid gland, as mdicated by virtual northern blot analysis. In 30 events in the signal transduction pathway that indicate 

addition, PCR-based tissue screening panels indicate expres- kinase activity. Thus, the phosphorylation of a substrate, 

sfen fa fetal brain • ' activation of a protein, a change in the expression of genes 

" .■ ' .. .■■ „„ fi ,ii -{_• mat are up- or down-regulated m response to the kinase 

The nroteins of the nresenl invention are also useiull in 7 v , * . . ■ , e ■ ■ * . . 

ioc pioicuia ui ure jj^m .u protein dependent signal cascade can be assayed, 

drug screening assays, in cell-based or cell-free systems. piu«ui wj»u 6" / 

Cell-based systems can be native, i. e ;, cells that normally 35 Any of the biological or biochemical functions mediated 

express the kinase, as a biopsy or expanded in cell culture. by the kinase can be used as an "^J^;™^ 

Experimental data as provided in FIG. 1 indicates expres- include all of the biochemical or biochemicaVbiotogical 

sioVin humans in teratocarcinoma, ovary, testis, nervous events described herein, in the references cited herein, 

tissue, bladder, infant and fetal brain, and thyroid gland. In incorporated by reference for these endpoint assay _targets, 

an alternate embodiment, cell-based assays involve recom- « and other functtons known to those of ordmary ^in he 

binant host cells expressing the kinase protein. «t or tha can be readily idenbfied usmg toutomto* 

The polypeptidescan be used •^fig™ IZ^cSZT^T^t^ll 

^^S^^SZZiX& « «- assaye, Experimenul data B p-j 

a^soda^ with the kinase. Bofc the kinases of the present 45 flG. 1 mdicates that die kmase proteins of the ^ present 

utS? .^appropriate variants and fragments can be invention are expressed u: 

used in high-throughput screens to assay candidate com- ovary, tesus, 

pounds for the abuity to bind to the kinase. These com- * ^ "SSL? 

pZdscanbefurmerLeenedaga^ addition PCR-based tissues 

. determine the effect of the compound on the kinase activity, so sion m fetal brain. .. _ . 

Further, these compounds can be tested in animal or inyerr Binding and/or activating compounds can also be 

tebrate systems to determine activity/effectiveness. Com- screened by. using chimeric kinase proteins in which the 

pounds can be identified that activate (agonist) or inactivate amino terminal extracellular domain, or parts thereof, the 

(antagonist) the kinase to a desired degree. entire transmembrane domain or subregions, such as any of 

Further, the proteins of the present invention can be used 55 the seven transmembrane segments or any of the intracel- 

to screen a compound for the ability to stimulate or inhibit hiUr or extracellular loops and the carboxy terminal inUa- 

toSon betJeen the kinase protein and a molecule that cellular domam, or parte thereof, can be rep aced by heter- 

nonSymteractswimme^ o ogous domains or subregions. For ex «™^' » '"J^; 

a component of the signal pathway that me kinase protein binding region can be used that mteracts wm a Afferent 

normaSy interacts J? £^"^ - t^^m^SSSS^ 

3EJ the kinase protein, or fragment, to interact with the .Hows ,^ W ^JP«g^^ ^ » e 
target molecule. L to detect the formation of a complex host cell from which the kinase is derived. _ 

between the protein and the target or to detect the biochemi- «s The proteins of the present invention are dso useful 10 
cal consequence of the interaction with the kinase protein competition binding assays in methods designed to discover 
and the target, such as any of the associated effects of signal compounds that interact with the kmase (e.g. binding part- 
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ners and/or ligands). Thus, a compound is exposed to a kinase activity in* a pharmaceutical composition to a subject 

kinase polypeptide under conditions that allow the com- in need of such treatment, the modulator being identified as 

pound to bind or to otherwise interact with the polypeptide. described herein. 

Soluble kinase polypeptide is also added to the mixture. If In yet another aspect of the invention, the kinase proteins 
the test compound interacts with the soluble kinase 5 can be used as "bait proteins" in a two-hybrid assay or 
polypeptide, it decreases the amount of complex formed or three-hybrid assay (see, e.g., U.S. Pat. No. 5,283,317; Zer- 
activity from the kinase target. This type of assay is par- vos et al. (1993) Cell 72:223-232; Madura et al. (1993)/. 
ticularly useful in cases in which compounds are sought that Biol Chem. 268:12046-12054; Bartel ct H. (1993) Biotech- 
interact with specific regions of the kinase. Thus, the soluble niques 14:920-924; Iwabuchi et al. (1993) Oncogene 
polypeptide that competes with the target kinase region is J0 8:1693/1696; and Brent WO94110300), to identify other 
designed to contain peptide sequences corresponding to the proteins, which bind to or interact with the kinase and are 
region of interest involved in kinase activity. Such kinase-binding proteins are 
To perform cell free drug screening assays, it is some- *to l^cly to be involved in the propagation of signals by the 
jo pciioiui tcu ucc uiug ~ & j , kinase proteins or kinase targets as, for example, down- 
times desirable to immobilize ^.^^^ ° stream elements of a kmase-Lmated signaling pathway, 
fragment, or its target molecule to facilitate sepa ation of 15 ^ ki DaS e. bul diag protein! are likely to be 
complexes from uncomplexed forms of one or both of the inhibitors 

proteins, as well as to accommodate automation of the assay. ^ system fa based on the moduhr nature of 

Techniques for immobilizing proteins on matrices can be mQst ^^p^on factors, which consist of separable DNA- 

used in the drug screening assays. In one embodiment, a b m di n g and activation domains. Briefly, the assay utilizes 

fusion protein can be provided which adds a domain that 2 o two different DNA constructs. In one construct, the gene that 

allows the protein to be bound to a matrix. For example, codes for a kinase protein is fused to a gene encoding the 

glutathione-S-transferase fusion proteins can be adsorbed DNA binding domain of a known transcription factor (e.g., 

onto glutathione sepharose beads (Sigma Chemical, St. GAL-4). In the other construct, a DNA sequence, from a 

Louis, Mo.) or glutathione derivatized microtitre plates, library of DNA sequences, that encodes an unidentified 

which are then combined with the cell lysates (e.g., 35 S- 25 protein ("pity" or "sample") is fused to a gene that codes for 

labeled) and the candidate compound, and the mixture the activation domain of the known transcription factor. If 

incubated under conditions conducive to complex formation the "bait" and the "prey" proteins are able to interact, in 

(e.g., at physiological conditions for salt and pH). Following vivo, forming a kinase-dependent complex, the DNA- 

incubation, the beads are washed to remove any unbound binding and activation domains of the transcription factor 

label, and the matrix immobilized and radiolabel determined 30 are b ™& 1 close Proximity. Tfcis proximity allows 

directly, or in the supernatant after the complexes are transcription of a reporter gene (e.g ;., LacZ) which is oper- 

dissociated. Alternatively, the complexes can be dissociated ably linked to a transcriptional regulatory site responsive to 

from the matrix, separated by SDS-PAGE, and the level of the transcription factor. Expression of the reporter gene can 

kinase-binding protein found in the bead fraction quantitated be detected and cell colonies containing the pactional 

from the gel using standard electrophoretic techniques. For 35 transcripUon factor can be isolated and used to obtain the 

example, either the polypeptide or its target molecule can be cloned gene which encodes the protein which interacts with 

immobilized utilizing conjugation of biotin and streptavidin the kinase protein. 

using techniques well known in the art. Alternatively, anti- This invention further pertains to novel agents identified 

bodies reactive with the protein but which do not interfere by the above-described screening assays. Accordingly, it is 

with binding of the protein to its target molecule can be 40 within the scope of this invention to further use an agent 

derivatized to the wells of the plate, and the protein trapped identified as described herein in an appropriate animal 

in the wells by antibody conjugation. Preparations of a model. For example, an agent identified as described herein 

kinase-binding protein and a candidate compound are incu- (e.g., a kinase-modulating agent, an antisense kinase nucleic 

bated in the kinase protein-presenting wells and the amount acid molecule, a kmase-specific antibody, or a kinase- 

of complex trapped in the well can be quantitated. Methods 45 binding partner) can be used in an animal or other model to 

for detecting such complexes, in addition to those described determine the efficacy, toxicity, or side effects of treatment 

above for the GST-immobilized complexes, include immu- with such an agent Alternatively, an agent identified as 

nodetection of complexes using antibodies reactive with the described herein can be used in an animal or other model to 

kinase protein target molecule, or which are reactive with determine the mechanism of action of such an agent, 

kinase protein and compete with the target molecule, as well 50 Furthermore, this invention pertains to uses of novel agents 

as enzyme-linked assays which rely on detecting an enzy- identified by the above-described screening assays for treat- 

matic activity associated with the target molecule. ments as described herein. 

Agents that modulate one of the kinases of the present The kinase proteins of the present invention are also 

invention can be identified using one or more of the above useful to provide a target for diagnosing a disease or 

assays, alone or in combination. It is generally preferable to 55 predisposition to disease mediated by the peptide, 

use a cell-based or cell free system first and then confirm Accordingly, the invention provides methods for detecting 

activity in an animal or other model system. Such model the presence, or levels of; the protein (or encoding mRNA) 

systems are well known in the art and can readily be ( in a cell, tissue, or organism. Experimental data as provided 

employed in this context. in FIG. 1 indicates expression in humans in teratocarcinoma, 

Modulators of kinase protein activity identified according 60 ovary, testis, nervousi tissue, bladder infant and fetal brain, 

to these drug screening assays can be used to treat a subject and thyroid gland Hie method involves contorting a bio- 

with a disorder mediated by the kinase pathway, by treating logical sample with a compound capable of mteracUng with 

cells or tissues that express the kinase. Experimental data as the kinase protein such that the interaction can be detected 

provided in FIG. 1 indicates expression in humans in Such an assay can be ^provided in a single detection format 

teratocarcinoma, ovary, testis, nervous tissue, bladder, infant 65 or a multi-detection format such as an antibody chip array, 

and fetal brain, and thyroid gland. These methods of treat- One agent for detecting a protein in a sample is an 

ment include the steps of administering a modulator of antibody capable of selectively binding to protein. A bio- 
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Iorical sample includes tissues, cells and biological fluids more or less active in substrate binding, and kinase activa- 

Sd^om a subject afwel as tissues, cells and fluids tion. Accordingly, substrate dosage would necessanly be 

. •? v ™S modified to maximize the therapeutic effect within a given 

FC C^ mention also provide targets population *^<£^ 

for dialing active protein activity, disease, or predispo- 5 genotyping, specific polymorphic peptides could be identi- 

tepeptid*^^ 

one belongs. Tlius, the peptide can be isolated I torn a cxpre P ^ £ humans in teratocarcinoma, 

biological sample and assayed for the p esence^ of a geneUc 10 *J* P infant and fetal brain, 

mutation mat resuUs m aberrant peptide. Tta ^mcludes o ry Accordingly, methods for treatment 

amino acid substituUon, deletion, "^J^^^ £Lte the use of the kinase plotein or fragments, 

(as the result of aberrant splicing events), and inappropriate wwuue ^ r 

post-translational modification. Analytic methods include Antibodies 

altered electrophorctic mobility, altered tryptic peptide 15 ^ mvention a]s0 prov ides antibodies that selectively 

digest, altered kinase activity in cell-based or cell-free assay, ^ ^ ^ of ^ p^,^ of fl, e prcse nt invention, a protein 

alteration in substrate or antibody-binding pattern, altered ^^^5^ sucn a peptide, as well as variants and fragments 

isoelectric point, direct amino acid sequencing, and any mcreo f. As used herein, an antibody selectively binds a 

other of the known assay techniques useful for detecting t „ ptide when it binds ^ targct pc^fo and does not 

mutations in a protein. Such an assay can be provided in a 2 o 5^^^^ bind to unrelated proteins. An antibody is still 

single detection format or a multi-detection format such as g^^ered to selectively bind a peptide even if it also binds 

an antibody chip array. t0 otner proteins that are not substantially homologous with 

In vitro techniques for detection of peptide include mc Uiget p^de so long as such proteins share homology 

enzyme linked immunosorbent assays (EUSAs), Western a fr agment or domain of the peptide target of the 

blots, immunoprecipitations and immunofluorescence using 25 ant ibody. In this case, it would be understood that antibody 

a detection reagent, such as an antibody or protein binding biding to the peptide is still selective despite some degree 

agent. Alternatively, the peptide can be detected in vivo in a of cjos^jeactivity. 

subject by introducing into the subject a labeled anti-peptide ^ ^ m antibody ^ defined in terms consistent 

antibody or other types of detection agent. For example, the ^ ^ ^^^^^ ^ibm the art: they are multi-subunit 

antibody can be labeled with a radioactive marker whose 30 p rotems p ro d U ced by a mammalian organism in response to 

presence and location in , a subject can be detected by an c b a llenge. The antibodies of the present invention 

standard imaging techniques. Particularly useful are meth- ^lote polyclonal antibodies and monoclonal antibodies, as 

ods that detect the allelic variant of a peptide expressed in a ^ ^ fragmcnts 0 f sucn antibodies, including, but not 

subject and methods which detect fragments of a peptide in Umi{cd t0( Fab or ^ab^, and Fv fragments, 

a sample. 35 M any me thods are known for generating and/or identify- 

The peptides are also useful in pharmacogenomic analy- . ^^^5 t0 a gi ven ta rget peptide. Several such meth- 

sis. Pharmacogenomics deal with clinically significant ^ m described by Harlow, Antibodies, Cold Spring 

hereditary variations in the response to drugs due to altered Hart)0r (1989). 

drug disposition and abnormal action in affected persons- {q ra(ft antib odies, an isolated peptide is 

See, e.g., Eichelbaum, M. (Clin. Exp. Pharmacol Physiol « ^ * m ^ m * n m6 & administered to a mammalian 

23(10-11):983~985 (1996)), and Under, M. W. (ClinChem. ^ ^ „ a m ^ or m0U se. Hie full-length 

43(2):254-266 (1997)). The clinical outcomes of these ^ ^ fragment of g protein 

variations result in severe toxicity of therapeutic drugs in J^an be used Particularly important fragments are those 

certain individuals or therapeutic failure of drugs m certain functional domains, such as the domains identified 

individuals as a result of individual variation in metabolism. 45 ^ RG * ^ Qf ^ homo i ogy or divergence 

Thus, the genotype of the individual can detenmne the way ^ ^ fe such M ^ mat ^ rcad ily be 

a therapeutic compound acts on the body or the way tne ideQtificd protcin alignment methods and as presented 

body metabolizes the compound. Further, the activity of fa ^ Flgurcs 

drug metabolizing enzymes effects both the intensity and M ^ bo6ks arc preferably prepared from regions or dis- 

dmttaofdn*actta so ^ *^ kina^ proteins. Antibodies canbe 
individual permit the selection of effective «»P^ wd. £^^ oman y^ 

effective dosages of such compouuo^ forj,rophylacUcor fc ^ ^ ons ^include those involved in 

therapeutic treatment based on the individual s genotype. and/or ^ase/binding partner interaction. 

Tlie discovery of genetic po ymoimmsms m some drug 2 can be used to identify particularly important regions 

metabolizing enzymes has explained why some patients do 55 alignment can be used to identify conserved 

not obtain the expected drug effect* show an exacted JjW 

drug effect, or experience senous toxicity &om standard an ^ g icall ^prfse at least 8 

drug dosages. Polymomhisms can ^ expr^ ^ntb e phe- ^ g, rttidu ^ e l ntlg £ c ^ can 

notype of the extensive metabohzer and the phenotype of the ^ u ~~ wr 9t in 10 14 ifior more amino 

poo^mcUbolizer-Acconiingly . genetic - ^^Si^tS^^^^^ 
lead to .UeUc protein vmante of ±e ^P'™.^ Z^S^Tt^L correspond to regions that are 

one or more of the kinase functions in one population is v™V™y, ™ " of , he 3 te in e e hydrophiUc 

allow a target to ascertain a geneUc predisposition that can reg»n» orw«i « —* -» 

affect treatment modality. Thus, in a ligaod-based treatment, 65 ri\M. &)• ...... .. ,- „,„ t^, 

polymorphism may give rise to anunolerminal extracellular Detection on an^ntibody of the present^ "venhon «n be 
Ens and/or other substrate-binding regions that are facilitated by couphng (,.e., physically tnkmg) the antibody 
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to a detectable substance. Examples of detectable substances proteins can be used to identify individuals that require 
include various enzymes, prosthetic groups, fluorescent modified treatment modalities. The antibodies are also use- 
materials, luminescent materials, bioluminescent materials, ful as diagnostic tools as an immunological marker for 
and radioactive materials. Examples of suitable enzymes aberrant protein analyzed by electrophoretic mobility, iso- 
include horseradish peroxidase, alkaline phosphatase, 5 electric point, tryptic peptide digest, and other physical 
p-galactosidase, or acetylcholinesterase; examples of suit- assays known to those in the art 

able prosthetic group complexes include streptavidin/biotin antibodies are also useful for tissue typing. Experi- 

and avidin/biotin; examples of suitable fluorescent materials mental data as provided in FIG. 1 indicates expression in 

include umbel liferone, fluorescein, fluorescein humans in teratocarcinoma, ovary, testis, nervous tissue, 

isothiocyanate, rhodamine, dichlorotriazinylamine 10 bladder, infant and fetal brain, and thyroid gland. Thus, 

fluorescein, dansyl chloride or phycoerythrin; an example of where a specific protein has been correlated with expression 

a luminescent material includes himinol; examples of bio lu- m a specific tissue, antibodies that are specific for this 

minescent materials include luciferase, luciferin, and protein can be used to identify a tissue type, 

aequorin^and^examples^f suitable radioactive material ^ mhodies are also wM for inhiDi ting prot ein 

include I, I, S or H. is f unct i 0Ilj f or example, blocking the binding of the kinase 

Antibody Uses peptide to a binding partner such as a substrate. These uses 

Tne antibodies can be used to isolate one of the proteins ca ° *? e applied in a therapeutic context in which 

of the present invention by standard techniques, such as ^tmcnt involves mhibiting the protein s fiinction. An 

affinity chromatography or immunoprecipitation. The anti- 20 ^ody can be used, for example, to block bmdmg, thus 

bodies can facilitate the purification of the natural protein modulating (agomzing or antagonizing) the peptides activ- 

from cells and recombinantly produced protein expressed in Antibodies can be prepared against specific fragments 

host cells. In addition, such antibodies are useful to detect containing sites required for function or against intact pro- 

the presence of one of the proteins of the present invention *™ * at * ****** with a ceU or ceU membrane. See FIG. 

in cells or tissues to determine the pattern of expression of ^ 2 for structural information relating to the proteins of the 

the protein among various tissues in an organism and over present invenuon. 

the course of normal development. Experimental data as The invention also encompasses kits for using antibodies 

provided in FIG. 1 indicates that the kinase proteins of the to detect the presence of a protein in a biological sample, 

present invention are expressed in humans in The kit can comprise antibodies such as a labeled or label- 

teratocarcinoma, ovary, testis, nervous tissue, bladder, infant 30 able antibody and a compound or agent for detecting protein 

brain, and thyroid gland, as indicated by virtual northern blot in a biological sample; means for determining the amount of 

analysis. In addition, PCR-based tissue screening panels protein, in the sample; means for comparing the amount of 

indicate expression in fetal brain. Further, such antibodies protein in the sample with a standard; and instructions for 

can be used to detect protein in situ, in vitro, or in a cell use. Such a kit can be supplied to detect a single protein or 

lysate or supernatant in order to evaluate the abundance and 35 epitope or can be configured to detect one of a multitude of 

pattern of expression. Also, such antibodies can be used to epitopes, such as in an antibody detection array. Arrays are 

assess abnormal tissue distribution or abnormal expression described in detail below for nuleic acid arrays and similar 

during development or progression of a biological condition. methods have been developed for antibody arrays. 

Antibody detection of ckculadng fragments of the full Nucleic Acid Molecules 

length protein can be used to identity turnover. ^ 

Further, the antibodies can be used to assess expression in The present invention further provides isolated nucleic 

disease states such as in active stages of the disease or in an acid molecules that encode a kinase peptide or protein of the 

individual with a predisposition toward disease related to the present invenuon (cDNA, transcript and genomic sequence), 

protein's function. When a disorder is caused by an inap- Such nucleic acid molecules will consist of, consist essen- 

propriate tissue distribution, developmental expression, 45 tially of, or comprise a nucleotide sequence that encodes one 

level of expression of the protein, or expresseoVprocessed of the kinase peptides of the present invention, an aUelic 

form, the antibody can be prepared against the normal variant thereof or an ortholog or paralog thereof, 

protein. Experimental data as provided in FIG. 1 indicates As used herein, an "isolated" nucleic acid molecule is one 

expression in humans in teratocarcinoma, ovary, testis, ner- that is separated from other nucleic acid present in the 

vous tissue, bladder, infant and fetal brain, and thyroid 50 natural source of the nucleic acid. Preferably, an "isolated* 1 

gland. If a disorder is characterized by a specific mutation in nucleic acid is free of sequences which naturally flank the 

the protein, antibodies specific for this mutant protein can be nucleic acid (i.e., sequences located at the 5* and 3' ends of 

used to assay for the presence of the specific mutant protein. the nucleic acid) in the genomic DNA of the organism from 

The antibodies can also be used to assess normal and which the nucleic acid is derived. However, there can be 
aberrant subcellular localization of cells in the various 55 some flanking nucleotide sequences, for example up to 
tissues in an organism. Experimental data as provided in about 5KB, 4KB, 3KB, 2KB, or 1KB or less, particularly 
FIG. 1 indicates expression in humans in teratocarcinoma, contiguous peptide encoding sequences and peptide encod- 
ovary, testis, nervous tissue, bladder, infant and fetal brain, ing sequences within the same gene but separated by introns 
and thyroid gland. The diagnostic uses can be applied, not in the genomic sequence. The important point is that the 
only in genetic testing, but also in monitoring a treatment 60 nucleic acid is isolated from remote and unimportant flank- 
modality. Accordingly, where treatment is ultimately aimed ing sequences such that it can be subjected to the specific 
at correcting expression level or the presence of aberrant manipulations described herein such as recombinant 
sequence and aberrant tissue distribution or developmental expression, preparation of probes and primers, and other 
expression, antibodies directed against the protein or rel- uses specific to the nucleic acid sequences, 
evant fragments can be used to monitor therapeutic efficacy. $5 Moreover, an "isolated" nucleic acid molecule, such as a 

Additionally, antibodies are useful in pharmacpgenomic transcript/cDNA molecule, can be substantially free of other 

analysis. Thus, antibodies prepared against polymorphic cellular material, or culture medium when produced by 
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recombinant techniques, or chemical precursors or other a protein from precursor to a mamre form, facflitate protein 

chemicals when chemically synthesized. However, the trafficking, prolong or shorten protein halHife or facihtate 

S acMmolecule can be fused to other coding or manipulation of a prote.n for assay or product™ . among 

, , uiui^ui* Mn *iA-~A icnlflteH other things. As generally is the case in situ, the additional 

regulatory sequences and still be considered isolated ^ ^ ^ ^ ^ mamrc 

For example, recombinant DNA molecules contained in a 5 ^ cellular enzymes, 

vector are considered isolated. Further examples of isolated ^ mcntionc<1 a b 0 ve, the isolated nucleic acid molecules 

DNA molecules include recombinant DNA molecules mam- mcul d e , but are not limited to, the sequence encoding the 

tained in heterologous host cells or purified (partially or jon^ peptide alone, the sequence encoding the mature 

substantially) DNA molecules in solution. Isolated RNA peptide and additional coding sequences, such as a leader or 

molecules include in vivo or in vitro RNA transcripts of the 1 secretory sequence (e.g., a pre-pro or pro-protein sequence), 

isolated DNA molecules of the present invention. Isolated ^ sequence encoding the mature peptide, with or without 

nucleic acid molecules according to the present invention me additional coding sequences, plus additional non-coding 

further include such molecules produced synthetically. sequences, for example introns and non-coding 5* and 3* 

Accordingly, the present invention provides nucleic acid sequences such as transcribed but non-translated sequences 

molecules that consist of the nucleotide sequence shown in that play a role in transcription, mRNA processing 

FIG. 1 or 3 (SEQ ID NO:l, transcript sequence and SEQ ID (including splicing and polyadenylation signals), ribosome 

NO:3, genomic sequence), or any nucleic acid molecule that binding and stability of mRNA. In addition, the nucleic acid 

encodes the protein provided in FIG. 2, SEQ ID NO:2. A molecule may be fused to a marker sequence encoding, for 

nucleic acid molecule consists of a nucleotide sequence ^ example, a peptide that facilitates purification, 

when the nucleotide sequence is the complete nucleotide Isolated nucleic acid molecules can be in the form of 

sequence of the nucleic acid molecule. RNA, such as mRNA, or in the form DNA, including cDNA 

The present invention further provides nucleic acid mol- and genomic DNA obtained by cloning or produced by 
ecules that consist essentially of the nucleotide sequence chemical synthetic techniques or by a combination thereof, 
shown in FIG. 1 or 3 (SEQ ID NO:l, transcript sequence and ^ The nucleic acid, especially DNA, can be double-stranded or 
SEQ ID NO:3, genomic sequence), or any nucleic acid single-stranded. Single-stranded nucleic acid can be the 
molecule that encodes the protein provided in FIG. 2, SEQ coding strand (sense strand) or the non-coding strand (anti- 
ID NO:2. A nucleic acid molecule consists essentially of a sense strand). 

nucleotide sequence when such a nucleotide sequence is The invention further provides nucleic acid molecules that 

present with only a few additional nucleic acid residues in M encode fragments of the peptides of the present invention as 

the final nucleic acid molecule. well as nucleic acid molecules that encode obvious variants 

The present invention further provides nucleic acid mol- ; of the kinase proteins, of the present invention that are 
ecules that comprise the nucleotide sequences shown in FIG. described above. Such nucleic acid molecules may be natu- 
1 or 3 (SEQ ID NO:l, transcript sequence and SEQ ID rally occurring, such as allelic variants (same locus), para- 
N0 3 genomic sequence), or any nucleic acid molecule that 35 logs (different locus), and orthologs (different organism), or 
encodes the protein provided in FIG. 2, SEQ ID NO:2. A may be constructed by recombinant DNA methods or by 
nucleic acid molecule comprises a nucleotide sequence chemical synthesis. Such non-naturally occurring variants 
when the nucleotide sequence is at least part of the final may be made by mutagenesis techniques, including those 
nucleotide sequence of the nucleic acid molecule. In such a applied to nucleic acid molecules, cells, or organisms, 
fashion the nucleic acid molecule can be only the nucleotide ^ Accordingly, as discussed above, the variants can contain 
sequence or have additional nucleic acid residues, such as nucleotide substitutions, deletions, inversions and lnser- 
nucleic acid residues that are naturally associated with it or tions. Variation can occur in either or both the coding and 
heterologous nucleotide sequences. Such a nucleic acid non-coding regions. The variations can produce both con- 
molecule can have a few additional nucleotides or can servative and non-conservative amino acid substitutions, 
comprises several hundred or more additional nucleotides. A 45 The present invention further provides non-coding frag- 
brief description of how various types of these nucleic acid ments of the nucleic acid molecules provided in FIGS. 1 and 
molecules can be readily madetfsolated is provided below. 3. Preferred non-coding fragments include, but are not 

In FIGS 1 and 3, both coding and non-coding sequences limited to, promoter sequences, enhancer sequences, gene 

areprovided.Becauseofthesourceofthepresentinvcntion, modulating sequences and gene termination sequences, 

humans genomic sequence (FIG. 3) and cDNA/transcript 50 Such fragments are useful in controlling heterolo^usgene 

sequences(FIG.l),thenu^ expression and in developing screens to identify ^ene- 

will contain genomic intronic sequences, 5' and 3' non- modulating agents. A promoter can readily be identified as 

coding sequences, gene regulatory regions and non-coding being 5' to the ATG start site in the genomic sequence 

intergenic sequences. In general such sequence features are provided in FIG. 3. 

either noted in FIGS. 1 and 3 or can readily be identified 55 A fragment comprises a contiguous nucleotide sequence 

using computational tools known in the art. As discussed greater than 12 or more nucleotides. Further, a fragment 

below, some of the non-coding regions, particularly gene could at least 30, 40, 50, 100, 250 or 500 nucleotides in 

regulatory elements such as promoters, are useful for a length. The length of the fragment will be based on its 

variety of purposes, e.g. control of heterologous gene intended use. For example, the fragment can encode epitope 

expression, target for identifying gene activity modulating 60 bearing regions of the peptide, or can be useful as DNA 

compounds, and are particularly claimed as fragments of the probes and primers. Such fragments can be isolated using 

genomic sequence provided herein. the known nucleotide sequence to synthesize an oligonucle- 

The,isolated nucleic acid molecules can encode the otide probe. A labeled probe can then be *^*«^» 

mature protein plus additional amino or carboxyl-terminal cDNA uDrary, genomic DNA Ij^or mRNA > to 

amino acids, or amino acids interior to the mature peptide 65 nucleic acid corresponding the coding icgon. ttrjer 

(when the mature form has more than one peptide chain, for primers can be used in PCR reactions to clone specific 

instance). Such sequences may play a role in processing of regions of gene. 
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A probe/primer typically comprises substantially a puri- The nucleic acid molecules are also useful for construct- 

fied oligonucleotide or oligonucleotide pair. The oligonucle- ing recombinant vectors. Such vectors include expression 

otide typically comprises a region of nucleotide sequence vectors that express a portion of, or all of, the peptide 

that hybridizes under stringent conditions to at least about sequences. Vectors also include insertion vectors, used to 

12, 20, 25, 40, 50 or more consecutive nucleotides. 5 integrate into another nucleic acid molecule sequence, such 

Orthologs, homologs, and allelic variants can be identified : as into the cellular genome, to alter in situ expression of a 

using methods well known in the art. As described in the gene and/or gene. product. For example, an endogenous 

Peptide Section, these variants comprise a nucleotide coding sequence can be replaced via homologous recombi- 

sequence encoding a peptide that is typically 60-70%, nation vath aU or part of the coding region conUining one or 

70-$0%, 80-90%, and more typically at least about 90-95% ™ more specifically introduced mutations, 

or more homologous to the nucleotide sequence shown in The nucleic acid molecules are also useful for expressing 

the Figure sheets or a fragment of this sequence. Such antigenic portions of the proteins, 

nucleic acid molecules can readily be identified as being The nucleic acid molecules are also useful as probes for 

able to hybridize under moderate to stringent conditions, to determining the chromosomal positions of the nucleic acid 

the nucleotide sequence shown in the Figure sheets or a 15 molecules by means of in situ hybridization methods. The 

fragment of the sequence. Allelic variants can readily be g Cnc encoding the novel kinase protein of the present 

determined by genetic locus of the encoding gene. The gene invention is located on a genome component that has been 

encoding the novel kinase protein of the present invention is mapped to human chromosome 22 (as indicated in FIG. 3), 

located on a genome component that has been mapped to which is supported by multiple lines of evidence, such as 

human chromosome 22 (as indicated in FIG. 3), which is 20 STS and BAC map data. 

supported by multiple lines of evidence, such as STS and ^ DUcldc add molecules m ^ ^ M ^ maMng 

BAC map data. vectors <x»ntaining the gene regulatory regions of the nucleic 

FIG. 3 provides information on SNPs that have been hC i$ molecules of the present invention, 

found in the gene encoding the kinase protein of the present ^ nudeic add molecules m also ^ M for designing 

invention. SNPs were identified at 42 different nucleotide ribozymes corresponding to all, or a part, of the mRNA 

positxons. Some of these SNPs, which are located outside the produced from me nucleic acid molecules described herein. 

ORF and in introns, may affect gene transcription. _ . . . , t - . . , . 

. J . . . . . The nucleic acid molecules are also useful for making 

As used herein, the term "hybridizes under stringent vectoR . ^ cxpress p ^ Qr ^ of ^ 

conditions" is intended to describe conditions for hybrid- . 

ization and washing under which nucleotide sequences 30 . ™ c ™*™ acid m ? leculcs ™ ak ? ^ for oonstairt- 

encoding a peptide at least 60-70% homologous to each *S h< 1 ^/ Xpr ^ mg a ° r * ° f *° pUcleiC acld 

other typically remain hybridized to each other. The condi- molecules ana pepuaes. 

Hons can be such that sequences at least about 60%, at least The nucleic acid molecules are also useful for construct- 
about 70%, or at least about 80% or more homologous to in S transgenic animals expressing all, or a part, of the 
each other typically remain hybridized to each other. Such nucleic acid molecules and peptides, 
stringent conditions are known to those skilled in the art and The nucleic acid molecules are also useful as hybridiza- 
can be found in Current Protocols in Molecular Biology, tion probes for determining the presence, level, form and 
John Wiley & Sons, N.Y. (1989), 6.3.1-63.6. One example distribution of nucleic acid expression. Experimental data as 
of stringent hybridization conditions are hybridization in 6x 4Q provided in FIG. 1 indicates that the kinase proteins of the 
sodium chloride/sodium citrate (SSQ at about 45C, fol- present invention arc expressed in humans in 
lowed by one or more washes in 0.2xSSC, 0.1% SDS at teratocarcinoma, ovary, testis, nervous tissue, bladder, infant 
50-65C. Examples of moderate to low stringency hybrid- brain, and thyroid gland, as indicated by virtual northern blot 
ization conditions are well known in the art analysis. In addition, PCR -based tissue screening panels 

45 indicate expression in fetal brain. Accordingly, the probes 

Nucleic Acid Molecule Uses can t e to detect the presence of, or to determine levels 

The nucleic acid molecules of the present invention are of, a specific nucleic acid molecule in cells, tissues, and in 

useful for probes, primers, chemical intermediates, and in organisms. The nucleic acid whose level is determined can 

biological assays. The nucleic acid molecules are useful as be DNAor RNA Accordingly, probes corresponding to the 

a hybridization probe for messenger RNA, transcript/cDNA 50 peptides described herein can be used to assess expression 

and genomic DNA to isolate full-length cDNA and genomic : and/or g«ne copy number in a given cell, tissue, or organism, 

clones encoding the peptide described, in FIG. 2 and to These uses are relevant for diagnosis of disorders involving 

isolate cDNA and genomic clones that correspond to vari- an increase or decrease in kinase protein expression relative 

ants (alleles, orthologs, etc.) producing the same or related to normal results. 

peptides shown in FIG. 2. As illustrated in FIG. 3, SNPs 55 In vitro techniques for detection of mRNA include North- 
were identified at 42 different nucleotide positions. em hybridizations and in situ hybridizations. In vitro tech- 
The probe can correspond to any sequence along the niques for detecting DNA includes Southern hybridizations 
entire length of the nucleic acid molecules provided in the and in situ hybridization. 

Figures. Accordingly, it could be derived from 5* noncoding Probes can be used as a part of a diagnostic test kit for 

regions, the coding region, and 3' noncoding regions. 60 identifying cells or tissues that express a kinase protein, such 

However, as discussed, fragments are not to be construed as as by measuring a level of a kmase-encoding nucleic acid in 

encompassing fragments disclosed prior to the present a sample of cells from a subject e.g., mRNA or genomic 

invention. DNA, or determining if a kinase gene has been mutated. 

The nucleic acid molecules are also useful as primers for Experimental data as provided in FIG. 1 indicates tfiat the 

PCR to amplify any given region of a nucleic acid molecule 65 kinase proteins of the present invention are expressed in 

and are useful to synthesize antisense molecules of desired humans in teratocarcinoma, ovary, testis, nervous tissue, 

length and sequence. bladder, infant brain, and thyroid gland, as indicated by 
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virtual northern blot analysis. In addition, PCR-based tissue The nucleic acid molecules are also useful for monitoring 

screening panels indicate expression in fetal brain. the effectiveness of modulating compounds on the expres- 

Nucleic acid expression assays are useful for drug screen- sion or activity of the kinase gene in clinical trials or in a 

ing to identify compounds that modulate kinase nucleic acid treatment regimen. Thus, the gene expression pattern can 

expression 5 ^ a Daromctcr for the continuing effectiveness of 

The invention thus provides a method for identifying a treatment with the compound, particularly with compounds 
compound that can be used to treat a disorder associated to which a patient can develop resistance. TTie gene expres- 
with nucleic acid expression of the kinase gene, particularly sion pattern can also serve as a marker indicative of a 
biological and pathological processes that are mediated by physiological response of the affected cells to the compound, 
the kinase in cells and tissues that express it. Experimental 10 Accordingly, such monitoring would allow either increased 
data as provided in FIG. 1 indicates expression in humans in administration of the compound or the administration of 
teratocarcinoma, ovary, testis, nervous tissue, bladder, infant alternative compounds to which the patient has not become 
and fetal brain, and thyroid gland. The method typically resistant. Similarly, if the level of nucleic acid expression 
includes assaying the ability of the compound to modulate f a Us below a desirable level, administration of the com- 
the expression of the kinase nucleic acid and thus identifying 15 pound could be commensurately decreased. - 
a compound that can be used to treat a disorder characterized j^q nucleic acid molecules are also useful in diagnostic 
by undesired kinase nucleic acid expression. The assays can assays for qualitative changes in kinase nucleic acid 
be performed in cell-based and cell-free systems. Cell-based expression, and particularly in qualitative changes that lead 
assays include cells naturally expressing the kinase nucleic t0 pathology. The nucleic acid molecules can be used to 
acid or recombinant cells genetically engineered to express 2Q detect mutations in kinase genes and gene expression prod- 
specific nucleic acid sequences. ucts such as mRNA. The nucleic acid molecules can be used 

The assay for kinase nucleic acid expression can involve as hybridization probes to detect naturally occurring genetic 

direct assay of nucleic acid levels, such as mRNA levels, or mutations in the kinase gene and thereby to determine 

on collateral compounds involved in the signal pathway. whether a subject with the mutation is at risk for a disorder 

Further, the expression of genes that are up- or down- 2 5 caused by the mutation. Mutations include deletion, 

regulated in response to the kinase protein signal pathway addition, or substitution of one or more nucleotides in the 

can also be assayed. In this embodiment the regulatory gene, chromosomal rearrangement, such as inversion or 

regions of these genes can be operably linked to a reporter transposition, modification of genomic DNA, such as aber- 

gene such as luciferase. rant methylation patterns or changes in gene copy number, 

Thus, modulators of kinase gene expression can be iden- 30 such as amplification. Detection of a mutated form of the 

tilled in a method wherein a cell is contacted with a kinase gene associated with a dysfunction provides a diag- 

candidate compound and the expression of mRNA deter- nostic tool for an active disease or susceptibility to disease 

mined. The level of expression of kinase mRNA in the when the disease results from overexpression, 

presence of the candidate compound is compared to the level undercxpression, or altered expression of a kinase protein, 

of expression of kinase mRNA in the absence of the candi- 35 Individuals carrying mutations in the kinase gene can be 

date compound. The candidate compound can then be iden- detected at the nucleic acid level by a variety of techniques, 

tified as a modulator of nucleic acid expression based on this pig. 3 provides information on SNPs that have been found 

comparison and be used, for example to treat a disorder in the gene encoding the kinase protein of the present 

characterized by aberrant nucleic acid expression. When invention. SNPs were identified at 42 different nucleotide 

expression of mRNA is statistically significantly greater in 40 positions. Some of these SNPs, which are located outside the 

the presence of the candidate compound than in its absence, ORF and in introns, may affect gene transcription. The gene 

the candidate compound is identified as a stimulator of encoding the novel kinase protein of the present invention is 

nucleic acid expression. When nucleic acid expression is located on a genome component that has been mapped to 

statistically significantly less in the presence of the candidate human chromosome 22 (as indicated in FIG. 3), which is 

compound than in its absence, the candidate compound is 45 supported by multiple lines of evidence, such as STS and 

identified as an inhibitor of nucleic acid expression. BAC map data. Genomic DNA can be analyzed directly or 

The invention further provides methods of treatment, with can be amplified by using PCR prior to analysis. RNA or 

the nucleic acid as a target, using a compound identified cDNAcan be used in the same way. In some uses, detection 

through drug screening as a gene modulator to modulate of the mutation involves the use of a probe^primer in a 

kinase nucleic acid expression in cells and tissues that 50 polymerase chain reaction (PCR) (see, e.g. U.S. Pat. Nos. 

express the kinase. Experimental data as provided in FIG. 1 4,683,195 and. 4,683,202), such as anchor PCR or RACE 

indicates that the kinase proteins of the present invention are PCR, or, alternatively, in a . ligation chain reaction (LCR) 

expressed in humans in teratocarcinoma, ovary, testis, ner- (see, e.g., Landegran et al., Science 241:1077-1080 (1988); 

vous tissue, bladder, infant brain, and thyroid gland, as and Nakazawactal.,PiVA5 91:360-364 (1994)), the latter of 

indicated by virtual northern blot analysis. In addition, 55 which can be particularly useful for detecting point muta- 

PCR-based tissue screening panels indicate expression in tions in the gene (see Abravaya et al, Nucleic Acids Res. 

fetal brain. Modulation includes both up-reguktion (i.e. 23:675-682 (1995)). This method can include the steps of 

activation or agonization) or down-regulation (suppression collecting a sample of cells from a patient, isolating nucleic 

or antagonization) or nucleic acid expression., acid (e.g., genomic, mRNA or both) from the cells of the 

Alternatively, a modulator for kinase nucleic acid expres- 60 sample, contacting the nucleic acid sample with one or more 
sion can be a small molecule or drug identified using the primers which specifically hybridize to a gene under con- 
screening assays described herein as long as the drug or ditions such that hybridization and amplification of the gene 
small molecule inhibits the kinase nucleic acid expression in (if present) occurs, and detecting the presence or absence of 
the cells and tissues that express the protein. Experimental an amplification product, or detecting the size of the ampli- 
data as provided in FIG. 1 indicates expression in humans in 65 fication product and comparing the length to a control 
teratocarcinoma, ovary, testis, nervous tissue, bladder, infant sample. Deletions and insertions can be detected by a change 
and fetal brain, and thyroid gland in size of the amplified product compared to the normal 
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genotype. Point mutations can be identified by hybridizing involved in transcription, preventing transection and hence 

amplified DNA to normal RNA or antisense DNA production of kinase protein. An antisense RNA or DNA. 

seauences nucleic acid molecule would hybndize to the mRNA and 

Alternatively, mutations in a kinase gene can be directly thus block translation of mRNA into kinase protein 

identified, for example, by alterations in restriction enzyme 5 Alternatively, a class of antisense molecules can be used 

digestion patterns determined by gel electiophoresis. to inactivate mRNA in order to decrease expression of 

Further, sequence^pecific ribozymes (U.S. Pat; No. kinase nucleic acid. Accordingly, these molec^es can treat 

5 Si canbe used to score for the presence of specific a disorder characterized by-abnormal or undesired kmase 

mutations ^development or loss of a ribozyme cleavage nucleic acid expression. Tins techmque -volves cleavage 

sTperfectl^atchcdsequencescanbedisuflguishedfrom "> by means of ribozymes containing nucleotide sequences 

. ,ua ,..„,.« k„ n,.^i«.»«- rleavaite dieestion complementary to one or more regions in the mRNA that 

mismatched sequences J^«» ^£ d,g6Stl0a the ability of the mRNAio be translated. Possible 

assays or by differences in melUng temperature. ^ and ^ 

Sequence changes at specific locations can also be co^ond^ f 0 thfcatalytic and other factional 

assessed by nuclease protection assays such as RNase and ^ g ^ J h ^ ^ 

SI protection or the chemical cleavage method. .. r 

Furthermore, sequence differences between a mutant kinase Be nuclei acid molecules also provKie vectors for gene 

gene and a wild-type gene can be determined by direct DNA therapy in patients containing cells that are aberrant m 

Suencing. A variety of automated sequencing procedures kinase gene expression. Thus, recombinant cells, which 

c^ be utilized when performing the diagnostic assays include the patient's cells that have been engmeered ex ytvo 

Waeve C W (1995) Biotechniques 19:448), including 20 and returned to the patient, are introduced into an individual 

s^ncing^ wherethecellsproducethedesiredkinaseproteintotreatthe 

tional Publication No. WO 94/16101; Cohen et al., Adv. individual. 

Chromatogr. 36:127-162 (1996); and Griffin et al, Appl. The invention also encompasses kits for detecting the 

Biochem. Biotechnol. 38:147-159 (1993)). presence of a kinase nucleic acid in a biological sample. 

Other methods for detecting mutations in the gene include Experimental data as provided in FIG. 1 indicates that the 
methods in which protection from cleavage agents is used to kinase proteins of the present invention are expressed m 
detect mismatched bases in RNA/RNA or RNA/DNA humans in teratocarcmomvovary, testa, nervous tissue, 
duplexes (Myers etal., Science 230:1242 (1985)); Cotton et bladder, infant brain, and thyroid gland, 
iL PNAS 85:4397 (1988); Saleeba et al., Meth. Enzymol. 21 J0 virtual northern blot analysis. In addition, PCR-based tissue 
7-286-295 (1992)), electrophoretic mobility of mutant and screening panels indicate expression in fetal brain. For 
wild type nucleic acid is compared (Orita et al.; PNAS example, the kit can comprise reagente such as a labeled or 
86-2766 (1989); Cotton et *\.,Mutdt. Res. 285:125-144 labelable nucleic acid or agent capable of detecting kinase 
(1993); and Havashi et al., Genet Anal Tech. Appl. 9:73-79 nucleic acid in a biological sample; means for determining 
(1992)), and movement of mutant or wild-type fragments in 3S the amount of kinase nucleic acid in the sample; and means 
polyacrylamide gels containing a gradient of denaturant is for comparing the amount of kinase nucleic acid m tie 
assayed using denaturing gradient gel electrophoresis sample with a standard. The compound or agent can be 
(Myers et al., Nature 313:495 (1985)). Examples of other packaged in a suitable container. The kit can further corn- 
techniques for detecting point mutations include selective prise instructions for using the kit to detect kinase protein 
oligonucleotide hybridization, selective amplification, and w nRNA or DNA 
selective primer extension. Nucleic Acid Arrays 

The nucleic acid molecules are also useful for testing an 

individual for a genotype that while not necessarily causing The present invention further provides nucleic acid detec- 

the disease, nevertheless affects the treatment modality. tion kits, such as arrays or microarrays of nucleic acid 

Thus, the nucleic acid molecules can be used to study the 45 molecules that arc b *sed on 1 ^ sequence P ro " 

relationsbip between an individual's genotype and the indi- vided in FIGS. 1 and 3 (SEQ ID NOS:l and 3). 

vidual's response to a compound used for treatment As used herein "Arrays" or "Microarrays" refers to an 

(pharmacogenomic relationship). Accordingly, the nucleic array of distinct polynucleotides or oligonucleotides synthe- 

acid molecules described herein can be used to assess the sized on a substrate, such as paper, nylon or other type of 

mutation content of the kinase gene in an individual in order 50 membrane, filter, chip, glass slide, or any other suitable solid 

to select an appropriate compound or dosage regimen for support. In one embodiment, the microarray is prepared and 

treatment. FIG. 3 provides information on SNPs that have used according to the methods described in U.S. Pat/ No, 

been found in the gene encoding the kinase protein of the 5,837,832, Chee et al., PCT application W095/11995 (Chee 

present invention. SNPs were identified at 42 different et al.), Lockhart, D. J. et al. (1996; Nat Biotech: 1A: 

nucleotide positions. Some of these SNPs, which are located 55 1675-1680) and Schena, M. et al. (1996; Proc. Natl. Acad. 

outside the ORF and in introns, may affect gene transcrip- Sci. 93: 10614-10619), all of which are incorporated herein 

^ oa ' ' in their entirety by reference. In other embodiments, such 

Thus nucleic acid molecules displaying genetic variations arrays are produced by the methods described by Brown et 

that affect treatment provide a diagnostic target that can be al., U.S. Pat. No. 5,807,522. ' 

used to tailor treatment in an individual Accordingly, the 60 The microarray or detection kit is preferably composed of 

production of recombinant cells and animals containing a large number of unique, single-stranded nucleic acid 

mesepolymorohisms allow effective cbnical design of treat- sequences, usually either synthetic antisense oligonucle- 

ment compounds and dosage regimens. otides or fragments of cDNAs, fixed to a solid support The 

The nucleic acid molecules are thus useful as antisense oligonucleotides are P«fenibly about 6-60 nucleotides in 

constructs to control kinase gene expression in cells, tissues, « length, more preferably 15-30 nucleotides in 

and organisms. A DNA antisense nucleic acid molecule is most preferably about 20-25 nucleotides in length. For a 

designed to be complementary to a region of the gene certain type of microarray or detection kit, it may be 
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otides that arc specific to a gene or genes of interest. JSTof the kinase gefe of the present invention. FIG. 

In order to produce oligonucleotides to a known sequence 3 information on SNPs that have been found in the 
for a microarray or detection kit, the gene(s) of interest (or 10 enco ding the kinase protein of the present invention, 
an ORF identified from the contigs of the present jnvenUon) wwe identified at 42 different nucleotide positions, 
is typically examined using a computer algorithm which f gNp which m localed outs y e tDe ORF and 
starts at the 5' or at the 3' end of the nucleotide «q«ience fa ^ a&ct inscription. 
Typical algorithms will then identify oligomers ' of defined for Abating a nucleic acid molecule with a 
length that are unique .to .the gene, have a GC content wifoin w . toc ubau™ conditions depend on the format 
a range suitable for bybndization, and 1^ pred.^ sec- to £ the detectko methods employed, and 
ondary structure that may interfere with hybridization. In *P! ^ £ ^ add m6fceuto ^ fa , he 
certain situations it may be appropriate to use pairs or B tbe ^ recognize that any one of 
oligonucleotides on a microarray or detection tat ine 'commonly available hybridization, amplification or 
"pairs" will be identical, except for one nucleotide tiiat . ^ ^ gd d fo the 
preferably is located in the center of the sequence. Tbe arrcv as»y ^ ^J^^ 
second oligonucleotide in the pair (mismatched by one) * found m chard, T, An 
serves as a control. Tbe number ofohgonucleotide . paw may w jtodaB^wgay a «f /tetoerf Rcfou^«, 
range from two to one million. The oligomers ; are ; synuie- Publishers, Amsterdam, The Netherlands 
sized at designated areas on a substrate usmg a Ught-duected « Bullock, G. R. et al., Techniques in 
chemical process. The substrate may be paper i nylon or 25 }^„^L, focfe/n£sl)y Academic Press, Orlando, Fla. Vol 1 
other type of membrane, filter, chip, glass slide or any other JskT™- 2 ^jgS), Vol . 3 (i 9 85) : Tijssen, P., Procfice 
suitable solid support. fl/J(/ ^ ory Enzyme Immunoassays: Laboratory Tech- 
la another aspect, an oligonucleotide may be synthesized ^ fa Biochemistry and Molecular Biology, Elsevier 
on the surface of the substrate by using a chemical coupling g * m<x Amsterdam, The Netherlands (1985). 
procedure and an ink jet application apparatus, as described 30 . f ft invention include cells, 
in PCT application W095/251U6 (B^eschweder et al) . . ^ m J mal ^ oteii ^ lbi sample used 
which is incorporated herein in its entirety by whence « V above-described method will vary based on the assay 
another aspect, a "gndded" array analogous to a dot (or slot) ^ ^ method ^ ^ tissueSi cells 
blot may be used to arrange and link cDNA or extr ' acts used as the sample to be assayed. Methods for 
ohgonuckotidestothesurfeceof^^ 35 reparing nuc i eic acid extracts or of cells are well known in 
system, thermal, UV, mechanical or chemical bonding pro- P^P £ dcaa bc readily be adapted in order to obtain a 
cedures. An array, such as those described above may be compatible with the system utilized. 
P^^^^^^SotSSS TnLtherembo^mentoftheprJentinven.ion.ki^are 

£»trin? 24 96 3«4 l536, 6144or more oUgonucleotides, « me assays of the present mvenUon 

^SmL tw ° and one mMoa Which , Sp^Uy. 'he invention provid es * ' compartmen Utoed 

lendsU ,0 the efficient use of commercially available a^.&'S52S 

^r^ductsampleanalysisu^^^^ „ ^"^^^ 

detectionkit the RNAcr DNAf^m^ Se^mpS one or more of me following: wash 

c^tt reagen^^capableofdetectrngpresen^ofabou^ 

SiSSS^^ DU "« ^entah zed kit incu^any »* u, 

Jf.k^. L;™,™,, or detection kit so that the probe 50 which reagents are contained in separate containers. Such 

SSSot detection kit. Incubation conditions are strips of plastic, glass or paper, or arrayuig material mh as 

adjuSTSt hybndization occurs with precise oomple- silica. Such containers allows one *J**£lnnte 

mentarv matches or with various degrees of less comple- reagents from one compartment to another «<npartaent 

famed I to determine the levels and patterns of fluorescence. contaminated, and the agents or solutionsof each container 

^l% ££« «c ^cd to determine degree of can be added in a quantiUtiye ^on torn one compart- 

^wt^nTnt,^ «nd the relative abundance of each oligo- ment to another. Such containers will include a container 

cultured cells, biopsies or other tossue l^J^A end co ^ ^ m me ^ ^ Kada 

detecuon system may be^ed to measure the ^nce. c ^^^ t ^ v M ^ a ^^ geacotai c 

i^£sssssss^£^^ ^^ "Xzt** idc tizi£: 

&cqucDt» aiuiuiwucuiwijr. * M ™„«c mns&ion 65 sequence informatioa disclosed herein can be readily incor- 

£X?2S&?1S£. S ^.tedmtooneofme.establishedkitform^wmcharewell 

samples , raulauoDi * , ' F J r known in the art, particularly expression arrays. 
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Vectors/host Cells viruses, adenoviruses, poxviruses, pseudorabies viruses, and 

^ . . , M * . - . * retroviruses. Vectors may also be derived from combinations 

ThemventioD talso provides vectors containing Uie nucleic of ^ 80UIces such ^ ^ derived from lasmid and 

acid molecules described herein. The term "vector 1 refers to bacteriophage genetic elements, e.g. cosmids and 

a vehicle, preferably a nucleic acid molecule, which can phagemids. Appropriate cloning and expression vectors for 

transport the nucleic acid molecules. When the vector is a 5 prokaryotic and eukaryotic hosts are described in Sambrook 

nucleic acid molecule, the nucleic acid molecules are et al., Molecular Cloning: A Laboratory Manual 2nd. cd. t 

covalently linked to the vector nucleic acid. With this aspect Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 

of the invention, the vector includes a plasmid, single or N.Y., (1989). 

double stranded phage, a single or double stranded RNA or The regulatory sequence may provide constitutive expres- 

DNA viral vector, or artificial chromosome, such as a BAC, io sion in one or more host cells (i.e. tissue specific) or may 

PAC, YAC, OR MAC. provide for inducible expression in one or more cell types 

A vector can be maintained in the host cell as an extra- such as by temperature, nutrient additive, or exogenous 

chromosomal element where it replicates and produces factor such as a hormone or other ligand. A variety of vectors 

additional copies of the nucleic acid molecules. providmg for constituUve and inducible expression in 

Alternatively, the vector may integrate into the host cell is P^^tic and eukaryotic hosts are well known to those of 

genome and produce additional copies of the nucleic acid Quinary s m e • 

molecules when the host cell replicates. ™? nuc eic acid molecules can beinserted into the vector 

. \ F / . . nucleic acid by well-known methodology. Generally, the 

The invention provides vectors for the maintenance DNA sequence that will ultimately be expressed is joined to 

(cloning vectors) or vectors for expression (expression an expression vector by cleaving the DNA sequence and the 

vectors) of the nucleic acid molecules. The vectors can 2 o expression vector with one or more restriction enzymes and 

function in prokaryotic or eukaryotic cells or in both (shuttle . then ligating the fragments together. Procedures for restric- 

vectors). tion enzyme digestion and ligation are well known to those 

Expression vectors contain cis-acting regulatory regions of ordinary skill in the art. 

that are operably linked in the vector to the nucleic acid The vector containing the appropriate nucleic acid mol- 

molecules such that transcription of the nucleic acid mol- 25 ecule can be introduced into an appropriate host cell for 

eculcs is allowed in a host cell. The nucleic acid molecules propagation or expression using well-known techniques, 

can be introduced into the host cell with a separate nucleic Bacterial cells include, but are not limited to, E. coli, 

acid molecule capable of affecting transcription. Thus, the Streptomyces, and Salmonella typhimurium. Eukaryotic 

second nucleic acid molecule may provide a trans-acting cells include, but are not limited to, yeast, insect cells such 

factor interacting with the cis-regulatory control region to 3Q as Drosophila, animal cells such as COS and CHO cells, and 

allow transcription of the nucleic acid molecules from the plant cells. 

vector. Alternatively, a trans-acting factor may be supplied As described herein, it may be desirable to express the 

by the host cell. Finally, a trans-acting factor can be pro- peptide as a fusion protein. Accordingly, the invention 

duced from the vector itself It is understood, however, that provides fusion vectors that allow for the production of the 

in some embodiments, transcription and/or translation of the peptides. Fusion vectors can increase the expression of a 

nucleic acid molecules can occur in a cell-free system. 35 recombinant protein, increase the solubility of the recombi- 

The regulatory sequence to which the nucleic acid mol- nant protein, and aid in the purification of the protein by 

ecules described herein can be operably linked include acting for example as a ligand for affinity purification. A 

promoters for directing mRNA transcription. These include, proteolytic cleavage site may be introduced at the junction 

but are not limited to, the left promoter from bacteriophage of the fusion moiety so that the desired peptide can ulti- 

X, the lac, TRP, and TAC promoters from E. coli, the early 40 ma tely be separated from the fusion moiety. Proteolytic 

and late promoters from SV40, the CMV immediate early enzymes include, but are not limited to, factor Xa, thrombin, 

promoter, the adenovirus early and late promoters, and and enterokinase. Typical fusion expression vectors include 

retrovirus long-terminal repeats. pGEX (Smith et al., Gene 67:31-40 (1988)), pMAL (New 

In addition to control regions that promote transcription, England Biolabs, Beverly, Mass.) and pRIT5 (Pharmacia, 

expression vectors may also include regions that modulate 45 Piscataway, NJ.) which fuse glutathione S-transferase 

transcription, such as repressor binding sites and enhancers. (GST), maltose E binding protein, or protein A, respectively, 

Examples include the SV40 enhancer, the cytomegalovirus to the target recombinant protein. Examples of suitable 

immediate early enhancer, polyoma enhancer, adenovirus inducible non-fusion £. coli expression vectors include pTrc 

enhancers, and retrovirus LTR enhancers. (Amann et al., Gene 69:301-315 (1988)) and pET 11 d 

In addition to containing sites for transcription initiation 50 (Studier el al., Gene Expression Technology: Methods in 

and control, expression vectors can also contain sequences Enzymology 185:60-89 (1990)). 

necessary for transcription termination and, in the tran- Recombinant protein. expression can be maximized in. 

scribed region a ribosome binding site for translation. Other host bacteria by providing a genetic background wherein the 

regulatory control elements for expression include initiation host cell has an impaired capacity to proteolytically cleave 

and termination codons as well as polyadenylation signals. the recombinant protein. (Gottesman, S., Gene Expression 

The person of ordinary skill in the art would be aware of the Technology: Methods in Enzymology 185, Academic Press, 

numerous regulatory sequences that are useful in expression San Diego, Calif. (1990) 119-128). Alternatively, the 

vectors. Such regulatory sequences are described, for sequence of the nucleic acid molecule of interest can be 

example, in Sambrook et al, Molecular Cloning: A Labo- altered to provide preferential codon usage for a specific 

ratory Manual 2nd. ed., Cold Spring Harbor Laboratory host cell, for example £. coli. (Wada et al., Nucleic Acids 
Press, Cold Spring Harbor, N.Y., (1989). 60 Res. 20:2111-2118 (1992)). 

A variety of expression vectors can be used to express a The nucleic acid molecules can also be expressed by 

nucleic acid molecule. Such vectors include chromosomal, expression vectors that are operative in yeast. Examples of 

episomal, and virus-derived vectors, for example vectors vectors for expression in yeast e.g., 5. cerevisiae include 

derived from bacterial plasmids, from bacteriophage, from pYepSecl (Baldari, et al., EMBO J. 6:229-234 (1987)), 
yeast episomes, from yeast chromosomal elements, includ- 65 pMFa (Kurjan et al., Cell 30:933-943(1982)), pJRY88 

ing yeast artificial chromosomes, from viruses such as (Schultz et al., Gene 54:113-123 (1987)), and pYES2 

baculoviruses, papovaviruses such as SV40, Vaccinia (Invitrogen Corporation, San Diego, Calif.). 
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The nuc i eic acid molecules can also be expressed in insect recombinant vector constructs. The marker can be contained 

celb £ for VxSSTbMutovinis expression vectors. in the same vector that contains the nucleic acid molecules 

Kl3 vectors available for expression of proteins in described herein or may be on a separate vector Markers 

Sedinsect cells fee Sf 9 cells) include the pAc series include tetracycline or ampicilhn-resistance genes for 

ttmfth t Ta£ 'CeU Biol 3*156-2165 (1983)) and the 5 prokaryotic host cells and dihydrofolate reductase or neo- 

ffi^tf&o™*^ W» istance . for eukaryotic ^^Z^Z 

In certain embodiments of the invention, the nucleic acid marker that prov.des selection for a phenotypic trait wdl be 

molecules described herein are expressed in mammalian enecu . , . L ^ . 

S m^mmaUan expression vectors. Examples of While the mature proteins can be produced mbactena 

mammalian expression vectors include pCDM8 (Seed, B. 10 yeast, mammalian cells, and other cells under the ■ of 

NaZre 329:840(1987)) and pMT2PC (Ka^>"> et al -> me W'opnate regulatory sequences, ceU-free transcription 

ruin ■ Zun i o< Zqsto and translation systems can also be used to produce these 

EMBOJ. 6.187-195 (V)oi))- «™auhi»„,» proteins using RNA derived from the DNA constructs 

The expression vectors listed herein are provided by way V„jr .T*. 

of example only of the well-known vectors available to described Herein. 

So* TordK skill in the art that would be useful to 15 Where secretion of the peptide is desired which b diffi- 

e££» the nucleic acid molecules. Tbe person of ordinary cult to achieve with mult.-transmembrane domain^ contam- 

skUl to the art would be aware of other vectors suitable for ing proteins such as kinases, appropriate secretion signals 

n^wna^propagaUon or expression of the nucleic acid are incorporated into the vector. Tbe ^"gf"^* 

mokcules described herein. These are found for example in endogenous to the peptides or heterologous to these pep- 

Sambrook, J., Fritsh, E. F, and Maniatis, T. Molecular tries. 

Cloning: A Laboratory Manual. 2nd, ed., Cold Spring Where the peptide is not secreted into the medium, which 

Harbor Laboratory, Cold Spring Harbor Laboratory Press, js typically the case with kinases, the protein can be isolated 

Cold Spring Harbor, N.Y, 1989. from the host cell by standard disruption procedures, lnchid- 

The invention also encompasses vectors in which the ing freeze thaw, sonication^ mechanical disruption, use of 

nuSLTr^Lces described herein are cloned into the lysing agentsand the flee .The peptidecan thentemcovered 

vector in reverse orientation, but operably linked to a 25 and purified by well-known punfica ion methods including 

. ^^V^^^^^^noi^^ ™™* ivm "f* P"^*™' acid exUactio^ anion or 

W^TnuTan antisense teanscript can be produced to all, catiomc exchange chromatography, phosphocellulose 

0M0 a Son of mTnucleic acid molecule sequences chromatography, hyclrophobic-interactton chroma ography, 

described terein LcludSg both coding and non-coding affinity chromatography, hydroxykpatite chromatography, 

S^iSn of Stisense RNA is subject to each 30 lectin chromatography, or high performance liquid chroma- 

of the parameters described above in relation to expression tography. ^ 

of the sense RNA (regulatory sequences, constitutive or It is also understood that depending upon the host cell in 

inducible expression, tissue-specific expression): recombinant 'production of the peptides described herem, the 

eukaryotic cells such as mammaUan cells. mediated process. 

The recombinant host cells are prepared by introducing Uses of Vectors and Host Cells 
the vector constructs described herein into the cells by 40 

techniques readily available to the person of ordinary skill in The recombinant host cells expressing the peptides 

toe^ S Llude, but are not limited to, calcium described herein have a variety of uses. First, the cells are 

phosphate transfection, DEAE-dextran-mediated useful for producing a kinase protein or pepUde that can be 

fransfection cat onic Upid-mediated transfection. further purified to produce desired amounts of lonaseprotein 

transduction infection, lipofection, and 45 ottoMTb^to^cad^tw^™^ 

other techniques such as those found in Sambrook, ct aL are useful for peptide production. 

(Molecular Cloning: A Laboratory Manual. 2nd, ed, Cold Host cells are also useful for conducting cell-based assays 

Spring Harbor Laboratory, Cold Spring Harbor Laboratory involving the kinase protein or kinase protein 1 fragments, 

Press, Cold Spring Harbor, N.Y., 1989). such as those described above as well as other formats 

Host cells can contain more than one vector. Thus, dif- 50 known in the art. Thus, a recombinant host cell expressing 

ferenTnuckotide sequences can be introduced on different a native kinase prote nis useful for assaying compoundsthat 

vectors of the same cell. Similarly, the nucleic acid mol- stimulate or inhibit kinase protein function. . 

ecules can be introduced either alone or with other nucleic Host cells are also useful for identifying kinase protein 

acid molecules that are not related to the nucleic acid mutants in which these functions are affected. If the mutants 

molecules such as those providing trans-acting factors for naturally occur and give rise to a pathology, host cells 

expression vectors. When more than one vector is intro- 55 containing the mutations are useful to assay compounds that 

duced into a cell, the vectors can be introduced have a desired effect on the mutant kinase protein (for 

independently co-inioduced or joined to the nucleic acid example, stimulating or inhibiting function) which may not 

molecule vector. be indicated by tbeir effect on the native kinase protein. 

In the case of bacteriophage and viral vectors, these can Genetically engineered host cells can be further used to- 
be introduced into cells as packaged or encapsulated virus «° produce non-human transgenic animals. A transgenic animal 

by standard procedures for infection and transduction. Viral is preferably a mammal, for example a rodent, such as arat 

vectors can be replication-competent or replication- or mouse, in which one or more of the cells of the 1 ammal 

defective In the case in which viral replication is defective. include a Iransgene. A transgene is exogenous DNA which 

replication will occur in host cells providing functions that is integrated into tie genome of a cell from which a 
complement the defects. « transgenic animal develops and which 

Vectors cenerally include selectable markers that enable genome of the mature animal inone or more cell types or 

IbSaS^X^^^^^'^^ touesofmetransgenicammal.Tbeseanunalsareusefulfor 
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studying the function of a kinase protein and identifying and binase and a selected protein is required. Such animals can 
evaluating modulators of kinase protein activity. Other be provided through the construction of "double" transgenic 
examples of transgenic animals include non-human animals, e.g., by mating two transgenic animals, one con- 
primates, sheep, dogs, cows, goats, chickens, and amphib- taining a transgene encoding a selected protein and the other 
ians. 5 containing a transgene encoding a recombinase. 

A transgenic animal can be produced by introducing Clones of the non-human transgenic animals described 

nucleic acid into the male pronuclei of a fertilized oocyte, herein can also be produced according to the methods 

e.g., by microinjection, retroviral infection, and allowing the described in Wilmut, I. et al. Nature 385:810-813 (1997) 

oocyte to develop in a pseudop regnant female foster animal. and PCT International Publication Nos. WO 97/07668 and 

Any of the kinase protein nucleotide sequences can be 10 WO 97/07669. In brief, a cell, e.g., a somatic cell, from the 

introduced as a transgene into the genome of a non-human transgenic animal can be isolated and induced to exit the 

animal, such as a mouse. growth cycle and enter G 0 phase. The quiescent cell can then 

Any of the regulatory or other sequences useful in expres- be fused, e.g., through the use of electrical pulses, to an 

sion vectors can form part of the transgenic sequence. This enucleated oocyte from an animal of the same species from 

includes intronic sequences and polyadenylation signals, if 15 which the quiescent cell is isolated. The reconstructed 

not already included. A tissue-specific regulatory sequence oocyte is then cultured such that it develops to morula or 

(s) can be operabiy linked to the transgene to direct expres- blastocyst and then transferred to pseudopregnant female 

sion of the kinase protein to particular cells. fosler animal. The offspring born of this female foster 

» m « j * 4 . . , \ animal will be a clone of the animal from which the cell, e.g.. 

Methods for generating transgenic animals via embryo J. v * . . t , ^ 
manipulation and microinjection, particularly animals such 2(J me somatic cell, is isolated. 

as mice, have become conventional in the art and are Transgenic animals containing recombinant cells that 

described, for example, in U.S. Pat Nos. 4,736,866 and express the peptides described herein are useful to conduct 

4,870,009, both by Leder et al, U.S. Pat. No. 4,873,191 by the assays described herein in an in vivo context. 

Wagner et al. and in Hogan, B., Manipulating the Mouse Accordingly, the various physiological factors that are 

£/w£vyo, (Cold Spring Harbor Laboratory Press, Cold Spring present in vivo and that could effect substrate binding, 

Harbor, N.Y., 1986). Similar methods are used for produc- 25 kinase protein activation, and signal transduction, may not 

tion of other transgenic animals. A transgenic founder ani- be evident from in vitro cell-free or cell-based assays, 

mal can be identified based upon the presence of the Accordingly, it is useful to provide non-human transgenic 

transgene in its genome and/or expression of transgenic animals to assay in vivo kinase protein function, including 

mRNA in tissues or cells of the animals. A transgenic substrate interaction, the effect of specific mutant kinase 

founder animal can then be used to breed additional animals 30 proteins on kinase protein function and substrate interaction, 

carrying the transgene. Moreover, transgenic animals carry- . and the effect of chimeric kinase proteins. It is also possible 

ing a transgene can further be bred to other transgenic to assess the effect of null mutations, that is, mutations that 

animals carrying other transgenes. A transgenic animal also substantially or completely eliminate one or more kinase 

includes animals in which the entire animal or tissues in the protein functions. 

animal have been produced using the homologqusly recom- 35 All publications and patents mentioned in the above 

bin ant host cells described ' herein. specification are herein incorporated by reference. Various 

In another embodiment, transgenic non-human animals modifications and variations of the described method and 

can be produced which contain selected systems that allow system of the invention will be apparent to those skilled in 

for regulated expression of the transgene. One example of the art without departing from the scope and spirit of the 

such a system is the cre/loxP recombinase system of bacte- 40 invention. Although the invention has been described in 

riophage PI. For a description of the cre/loxP recombinase connection with specific preferred embodiments, it should 

system, see, e.g., Lakso et al. PNAS 89:6232-6236 (1992). be understood that the invention as claimed should not be 

Another example of a recombinase system is the FLP unduly limited to such* specific embodiments. Indeed, vari- 

recombinase system of S. cerevisiae (O'Gonnan et al. Sd- ous modifications of the above-described modes for carrying 

ence 251:1351-1355 (1991). If a creAoxP recombinase 45 out the invention which are obvious to those skilled in the 

system is used to regulate expression of the transgene, field of molecular biology or related fields are intended to be 

animals containing transgenes encoding both the Cre recom- within the scope of the following claims. 



SEQUENCE LISTING 

<160> NUMBER OF SEQ ID NOS : 4 

<210> SEQ ID HO 1 
<211> LENGTH: 2320 
<212> TYPE l DHA 
<213> ORGANISM! Human 

<400> SEQUENCES 1 

cccagggcgc cgtaggcggt gcatcccgtt cgcgcctggg gctgtggtct tcccgcgcct 60 

gaggeggegg eggcaggage tgaggggagt tgtagggaac tgaggggagc tgctgtgtcc 120 

cccgcctcct cctccccatt tccgcgctcc egggaccatg tccgcgctgg egggtgaaga 180 



tgtctggagg tgtccaggct gtggggacca cattgctcca agecagatat ggtacaggac 240 



US 6,340,583 Bl 
37 38 



-continued . 



tgtcaacgaa 


acctggcacg gctettgctt ccggtgaaag tgatgcgcag cctggaccac 


300 


cccaatgtgc 


tcaagttcat tggtgtgctg tacaaggata agaagecgaa ccigc^a^a 


360 


gagtacattg 


aggggggcac actgaaggac tttctgcgca gtatggatcc g«ccc«gg 


420 


cagcagaagg 


tcaggtttgc caaaggaatc gcctccggaa .tggacaagac tgtggtggtg 


480 


gcagactttg 


ggctgtcacg getcatagtg gaagagagga aaagggcccc catggagaag 


540 


gccaccacca 


agaaacgcac cttgcgcaag aacgaccgca agaagcgcta cacggtggtg 


600 


ggaaacccct 


actggatggc ccctgagatg ctgaacggaa agagctatga tgagacggtg 


660 


gatatcttct 


cctttgggat cgttctctgt gagatcattg ggcaggtgta tgcagatcct 


720 


gactgccttc 


cccgaacact ggactttggc ctcaacgtga agcvtitctg ggagaa^s.*. 


780 


gttcccacag 


attgtccccc ggccttcttc ccgctggccg ccatctgctg cagactggag 


840 


cetgagagca 


gaccagcatt ctcgaaattg gaggactcct ttgaggccct ctccctgtac 


900 


ctgggggagc 


tgggcatccc getgcctgca gagetggagg agttggacca cactgtgagc 


960 


atgcagtacg 


gcctgacccg ggactcacct ccctagccct ggcccagccc cctgcagggg 


1020 


ggtgttctac 


agccagcatt gcccctctgt gccccattcc tgctgtgagc agggccgtcc 


1080 


gggcttcctg 


tggattggcg gaatgtttag aagcagaaca aaccattcct attacctccc 


1140 


caggaggcaa 


gtgggcgcag caccagggaa atgtatctcc acaggttctg gggcctagtt 


1200 


actgtctgta 


aatccaatac ttgcctgaaa gctgtgaaga agaaaaaaac ccctggcctt 


1260 


tgggccagga 


ggaatctgtt aetcgaatcc acccaggaac tccctggcag tggattgtgg 


1320 


gaggctcttg 


cttacactaa tcagcgtgac ctggacctgc tgggcaggat cccagggtga 


1380 


acctgcctgt 


gaactctgaa gtcactagtc cagctgggtg caggaggact tcaagtgtgt 


1440 


ggacgaaaga 


aagactgatg gctcaaaggg tgtgaaaaag teagtgatgc icccccwc 


1500 


tactccagat 


cctgtccttc ctggagcaag gttgagggag taggttttga agagtccctt 


1560 


aatatgtggt 


ggaacaggcc aggagttaga gaaagggctg gcttctgttt acercgc^cac 


1620 


tggctctagc 


cagcccaggg accacatcaa tgtgagagga agcccccacc ica^tfct-n. 


1680 


aaacttaata 


ctgg&gactg gctgagaact tacggacaa'c atcctttctg tctgaaacaa 


1740 


acagtcacaa 


gcacaggaag aggctggggg actagaaaga ggccctgccc tctagaaagc 


1800 


tcegatcttg 


. _ .1./,.^ .H-rn fi n+floactcc -fctatrt ca<iat gcctaaaaca 

qcttctgtta etc atflctcg ^9^373^ i.*,**^*.*-**^—- j -» 


I860 


ttttgcctaa 


agctcgatgg gttctggagg acagtgtggc ttgtcacagg cewgagtet 


1920 


gagggagggg 


agtgggagtc tcagcaatct cttggtcttg gcttcatggc aaccactgct 


1980 


cacccttcaa 


catgcctggt ttaggcagca gcttgggctg ggaagaggtg gtggcagagt 


.2040 


ctcaaagctg 


agatgetgag agagatagct ccctgagctg ggpcatctga cttctacctc 


2100 


ccatgtttgc 


tctcccaact cattagctcc tgggcagcat cctcctgagc cacatgtgca 


2160 


ggtactggaa 


aacctccatc ttggctccca gagctctagg aactcttcat cacaactaga 


2220 


tttgcctctt 


ctaagtgtct atgagcttgc accatattta ataaattggg aatgggtttg 


2280 


gggtattaaa 


aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa 


2320 



<210> SEQ ID NO 2 
<211> LENGTH 1 255 
<212> TYPE: PRT 
<213> ORGANISM : Human 

<4O0> SEQUENCE t 2 



Met Val Gin Aap Cya Gin Arg Aan Leu Ala Arg Leu Leu Leu Pro 
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-continued 



1 5 10 15 

Lys Val Met Arg Ser Leu Asp Hie Pro Asn Val Leu Lys Phe lie Gly 

- 20 25 30' 

Val Leu Tyr Lye Asp Lye Lya Leu Aan Leu Leu Thr Glu Tyr lie Glu 
35 40 45 

Gly Gly Thr Leu Lys Asp Phe Leu Arg Ser Met Asp Pro Phe Pro Trp 
50 55 60 

Gin Gin Lys Val Arg Phe Ala Lys Gly lie Ala Ser Gly Met Asp Lys 
65 70 75 80 

Thr Val Val Val Ala Asp Phe Gly Leu Ser Arg Leu He Val Glu Glu 
85 90 95 

Arg Lys Arg Ala Pro Met Glu Lys Ala Thr Thr Lye Lys Arg Thr Leu 
100 105 110 

Arg Lye Asn Asp Arg Lye Lys Arg Tyr Thr Val Val Gly Aan Pro Tyr 
115 120 125 

Trp Met Ala Pro Glu Met Leu Asn Gly Lys Ser Tyr Asp Glu Thr Val 
130 135 140 

Asp He Phe Ser Phe Gly lie Val Leu Cys Glu He He Gly Gin Val 
145 150 155 160 

Tyr Ala Asp Pro Asp Cys Leu Pro Arg Thr Leu Asp Phe Gly Leu Asn 
165 170 175 

Val Lys Leu Phe Trp Glu Lys Phe Val Pro Thr Asp Cys Pro Pro Ala 
180 185 190 

Phe Phe Pro Leu Ala Ala He Cys Cys Arg Leu Giu Pro Glu Ser Arg 
195 . 200 205 

Pro Ala Phe Ser Lys Leu Glu Asp Ser Phe Glu Ala Leu Ser Leu Tyr 
210 215 220 

Leu Gly Glu Leu Gly He Pro Leu Pro Ala Glu Leu Glu Glu Leu Asp 
225 230 235 240 

His Thr Val Ser Met Gin Tyr Gly Leu Thr Arg Asp Ser Pro Pro 
245 250 255 



<210> SEQ ID NO 3 
<211> LENGTH: 59065 
<212> TYPE: OKA 
<213> ORGANISM i Human 

<400> SEQUENCE: 3 

tcatccttgc gcaggggcca tgctaacctt ctgtgtctca gtccaatttt aatgtatgtg 60 

ctgctgaagc gagagtacca gaggtttttt tgatggcagt gacttgaact tatttaaaag 120 
ataaggagga gccagtgagg gagaggggtg ctgtaaagat aactaaaagt gcacttcttc . 180 

taagaagtaa gatggaatgg gatccagaac aggggtgtca taccgagtag cccagecttt 240 

gttccgtgga cactggggag tctaacccag agctgagata gcttgcagtg tggatgagcc 300 

agctgagtac agcagatagg gaaaagaagc caaaaatctg aagtagggct ggggtgaagg 360 

acagggaagg gctagagaga catttggaaa gtgaaaccag gtggatatga gaggagagag 420 

tagagggtct tgatttcggg tctttcatgc ttaacccaaa gcaggtacta aagtatgtgt. 480 

tgattgaatg tctttgggtt tctcaagact ggagaaagca gggcaagctc tggagggtat 540 

ggcaataaca agttatcttg aatatcctca tggtggaaag tcctgatcct gtttgaattt 600 

tggaaataga aatcattcag agccaagaga ttgaattgtt gagtaagtgg gtggtcaggt: 660 

tacagactta attttgggtt aaaaagtaaa aacaagaaac aaggtgtggc tctaaaataa 720 



US 6,340,583 Bl 
41 42 

-continued 

tgagatgtgc tgggggtggg gcatggcagc tcataaactg accctgaaag ctcttacatg 780 

taagagttcc aaaaatattt ccaaaacttg gaagattcat ttggatgttt gtgttcatta 840 

aaatctctca ctaattcatt gtcttgtcca ctgtccgtaa cccaaectgg gattggtttg 900 

agtgagtctc tcagactttc tgccttggag tttgtgagag agatggcata ctctgtgacc 960 

actgtcaccc taaaaccaaa aaggcccctc -ttgacaagga gtctgaggat tttagaccca 1020 

ggaagaatga gtgatgggca tatatatatc ctattactga ggcatgagaa gagtggaatg 1080 

ggtgggttga ggtggtgttt taag'gcctct tgccagcttg tttaactctt ctctggggaa 1140 

cgagggggac aactgtgtac attggctgct ccagaatgat gttgagcaat cttgaagtgc 1200 

caggagctgt gctttgtcta ttcatggccc ctgtgcctgt gaaacagggt tcggtgactg 1260 

tcactgtgcc tgtggcagtc tgtagttacc cagagagaac aaagctgcat acacagagcg 1320 

cacaagggag tcttgtaaca accttgtcct gctttctagg gctgagtcag gtaccacagc 1380 

ttgatctcag ctgtcctctt tatttcaaga agttgacatc tgagccatac caggagtatt 1440 

gtattttgtt tgaggcctct ctttttggag gaacatggac cgactctgtg cttttgtcta 1500 

tgctggtctc tgagctcaca caacccttca ccctcctttc tcagccagtg ataggtaagt 1560 

cttccctatc ttgcaaggct cagctcaagt gtcagcttcc tctacaaaga ctttcctggt 1620 

tcccetcatt ggagtgaaca agagttgaca tggtagaatg gaaagagcag aagctttaga 1680 

atgagccaga cctgagtatg aatgctagat ccaccactta gctagtcaac cctgccccct 1740 

gcctcaagtt ttaattttcc tatccattaa gtgaatataa taatacctgt gtcacaggat 1800 

tattttgaga attaaatgag attaggtcta tgaaagcacc tagcagagtt cttggcatat I860 

aggaggcatt cattaaatat ttgttcttcc ccttttatac ccattacttt tctttttctg 1920 

aactaaaata atacttggtt ctatctctga aataacatcc aagtgaaaaa tcaacaacat 1980 

gaaagagcag ttcttttcca gtggatttgc ttcttaagga gcagagatta tgtaatctaa 2040 

cagcctccaa catacaaaga gctttgtatc tagaacaggg gtccccagcc cctggaccgc 2100 

caactggtac gggtctgtag cctgttagga accaggctgc acagcaggag gtgagcggcg 2160 

ggccagtgag cattgctgcc tgagctctgc ctcctgtcag atcagtggtg gcattagatt 2220 

ctcataggag tgtgaaccct attgtgaact gcacatgcaa gggatctggg ttgcatgctc 2280 

cttatgagaa tctcactaat ggctgatgat ctgagttgga acagtttgat accaaaacca 2340 

tccccccgcc ccccaacccc cagcctaggg tccgtggaaa aattggcccc tggtgccaaa 2400 

aaggttgagg actgctgatc tagaggacca atttattcaa tgttggttga gtaaatgagc 2460 

tcttggatta ggtgatggaa aaatctgaaa aaacagggct tttgaggaat aggaaaaggc 2520 

agtaacatgt ttaacccaga gagaagtttc tggctgttgg ;ctgggaatag tcataggaag 2580 . .. 

ggctgacact gaaaagaagg agattgtgtt cgtttcttct tctcagagct ataagcaaag 2640 

gctgaaagtt ctagaaaaag gcaagttttg tttcagtaga aaaaaggata atcagaacca 2700 

tttttagaaa atggaatgag actacttttg aggccatgag ttccttgtcc ctggagagat 2760 

gagcagaggt tggacaagtg cttaccagag atcttgtgga ggcagaaact gtgcatctag 2820 

cagagcattg gcctaaccct ttcaaatgag atgctgttaa ctcagtctta ttctacatgg 2880 

taggaatcct gtccctttgc ctcctgctac tttgggcctc tcaacctctt ggttttgtgt 2940 

gcaggtgaag atgtctggag gtgtccaggc tgtggggacc acattgctcc aagccagata 3000 ; 

tggtacagga ctgtcaacga aacctggcac ggctcttgct tccggtaggt gggcctatcc 3060 

tcccatcttt accagtgtac tatgggccaa gcactatttc atgttctgat ggaaaacaca 3120 
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gaaacaagct tctgagttga gaatttcaat cttagggtgg ggaaaggaat gtaccaagga 3180 

agagctcatg accaaacctc aagtgtggcc cccctgaacc caggttaaat tggaagagcc 3240 

ataaatgggc cagctggagg cagggtgggg ggatgagagg agccctttcc agggttgtcc 3300 

catatccctc actttatggg tgaggaaact gaggcccagg aagagtgact ttcctgtggc 3360 

tgcactacag attatgcagg tacttcaaga gttgtttgta ttcttatttt attttatttt 3420 

attttatttt attttatttt attttatgag agggattctt gctgttgccc aggctggagt 3480 

gcagtggtgc aatctcggct cactgcaatc tctgcctgct gggttcaagt gatttttctg 3540 

ccttagcttc ctgagtagct gagatgacag gcacctgcca ccatgcgcag ctaatttttg 3600 

tattttagtg gagacggggg tttcaacatg ttggtcaggc tggtcttgaa ctcctgacct 3660 

caaatgatgc acccacctcg acctcccaaa gtgctggaat tacaggcgtg aaccactgtg 3720 

cccagccaag agttgttttt agtgtggttg gcagagccag ctcttccttc accacaggat 3780 

gcctccctag gttcctactt tttgttacta gcttttatta tagctatatt attattatta 3840 

ttattattat tattattatt attattgaga cagagtctcg ctctgtcgcc caggctggtg 3900 

tacagtggtg cgatcccggg ctcactgcaa cctctgcctc ccgagttcaa gcagttetcc 3960 

tgcctcagcc ccccgagtag gtgggactac aggcgcctgc caccacaccc ggctaatttt 4020 

tgtattttta gtagagacgg ggtttcacct tgttgaccag gctggtctgg agctcctgac 4080 

ctcaggtaag tgctagaatc acaggcgtga accactgcgc ccagccaaga gttgttttta 4140 

gtgtggttgg cag age cage tcttcctcac cacaggttgc ctccctaggt tcctactttt 4200 

tgttactagc tttattatag ctacattatt attattattg ttattattat tgagacagag 4260 

tctcgctctg tcgcccaggc tggtgtacag tgatgtgatc ttggctcact gcaacctctg 4320 

ccccccgagt tcaagcaatt ctcctgcttc agccccccta gtaggtggga ctccaggcac 4380 

ctgccaccac gcccagctaa tttttgtatt tttagtagag gcggggtttc accttgttgg 4440 

ccaggctggt ctcaaactcc tgacctcagg tgatccgcct gcctcggcct cccaaaatgt 4500 

tgggattaca ggcatgagee accgcgccct gectataget acattatttt tgtaggcagc 4560 

tcagtttctt aaaaattata cagacttcaa atcagatttg ttcctgctgt ctgaggctca 4620 

gtttcttcat ctggaaaatg gatggtaata atcttgttga gattgaatga aataatatat 4680 

gcagtgtatc cagtacatgg tagacaccca gtgaatggtt attccttcct cccatcggat 4740 . 

tggaattctc aagggtggga acttgtcttt atattcttca caaegtaaaa tagttgaaat 4800 

ttgttggtgg aaagaagagc agtccactcc agaggctgga tgggcatgee tggcccccaa 4860 

ggtctgaagt ggtagggctg tgectatate ctgagaatga gatagactag gcaggcacct 4920 . 

tgtgctgtag attccagctc ctgeacatag ctcttgttgt aaaacatccc tgtgcttata* 4980 

ccaagtaatt gagttgacct ttaaacactt gcctcttccc tgggaaccat ataggggatt 5040 

ggcctggaga cgtctggcct ctggaagagt tggaaagcag ccatcattat tatcctttcc 5100 

tttcagctat aactcagagc tctcaagtct tttctgtgga tettattgee ttggttcttg 5160 

ccccttttac tcccagggaa gttgattctg tcttttctgt tccatttagt atgacaggag 5220 

cagagaatgt cagagctgta agggacctta tagttaaagc ctttggctgg tcctttcatt 5280 

ttatagctgg gactaataag taacgtcaaa acccaatgag ttcacagatt gggtctcgcc 5340 

ttggcatgta acccatatgt tcatattctt gctgttttcc tatgtgtatg aatattttct 5400 

atccaaaata agcaggacag ggtagagcaa gttaatcttt ggaatttctg gattctctta 5460 
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gagctaaaaa acttcagaac tagaagaaac cacccactat atggtataac ccattcatat 5520 

cacagatgag gcctgaaacc. aaaaagactt gctcaggcca -tggatgacaa gagctggccc 5580 

tagcactgaa ctcttgggtc atttgtaggt ctagtcagat gctagcttgt tagctctgtg 5640 

cgtgcgtgtg tgtgtgtgtg tgtgtgtgtg tgtgtgagat agagacagaa agataacata 5700 

tgtacacaaa tacataaaga ggaagtagac acgttagcat ggtagataag agtacaggca 5760 

ggccaggcgt ggtggctcac gcctgtaatc ccagcacttt gggaggccaa ggcaggtgga 5820 

tcacctgagg tcaggaattc gagaccagcc tgaccaacat ggtgaaaccc catctctact 5880 

aaatacagaa aaaaattagc ttggcatggt ggcacatgcc tgtaatccca gctacttggg 5940 

aagctgaagc aggagaatcg cttgaatccg ggaagcagaa gttgcagtga gccgagattg 6000 

tgccattaca gtctagcctg ggcaacaaga gggaaactcc atcgcaaaaa aacaaccacc 6060 

accaagagta caggctatgg aatgagacta tggttttaaa tcctggcttt gcaatttatt 6120 

aactagcctt aagtgacttc cctgagcttc aggcaccaat ctgtaaaatg aggataagaa 6180 

tattactcat gccacatggt tgttagggag gattaaatgt gataacctat ataaagtggc 6240 

tagcatagca tctgacatat agaaaactct taatagggcc ggacgtggtg gcttatgcct 6300 

gtaatcctag cactctggga ggccgaggca gaaggatcgc ttgagcccat gagcccagga 6360 

gtttgagacc agcctggcca acatggcaaa actccacctc tacaaaaaat acaaaaatat 6420 

tagccaggcg tgatggcaca cacctgtagt cccagctact tgggaagctg aggagcgatg 6480 

attacctgag cccagggata tcaaggctgt agtgagctgt gatcatgcca ctgtactcca 6540 

tccagctggg ggacagagtg aaacccctgt ctcaaaacaa aacaaatgaa aaaaaaaacc 6600 . 

cttaataatc agtaactgtc actttatatt atgttgtgag tgtgtgtcta tatacaccta 6660 

tatgtataea tttctcttat tacacattca ttggtgatct gatgtggagc cccagggatt 6720 

aagggcaact ttgaactacc ctgacacaat caagccaaat atcattcccg tggaggaagt 6780 

agagtatcta ggttctgtct cctagttgca gctttacctt gaggacagag actctaatcc 6840 

agctgtgctg aaggagcaca tctcctgact tctgagcttt cccctggtaa attcaaactg 6900 

gatgtcacgg cgccctcaga tagagcctgg taatttgccc tggggagagt gactgtcttt 6960 

tggatctaat ttgacttttg ccccagttgg aggaaaatct tcagggctag gaaggattgt 7020 

atttgtctga ccccagagat aacctgggtt ttgaggaaca tggggcatca acctgaatgg 7080 

tcttgtaaga tctctcccac gccagcttgc cagtgtttct ctgatgaatt tagagtacct 7140 

gagtagtgca ggcctgctgg gaggaggact ctccctctgt gctactcaga gaaattcatt 7200 

cttcaaggcc cccttccagc cttgctctta cccagctggg ctacagttac aataaaggaa 7260 

atgacttttc ttctcccctt cccccagtac ctttgttttc ctagtcacag ggtggggctg . 7320 

gatattgaat ggagaaattg ctggggtcca tcctaaactc ctcccctcat ctctccctta 7380 

cattacccca ttcttctgtc tgcagccaca tccataatcc tgcctctgtt agccttccga 7440 

cagaccctca ggtgcccagg acaacaggaa gctacttaaa gctggaacct cagactgtgc 7500 

aatggaggcc agtgacaaaa ctgaaagtag ctetgtcagt aattgtgctg gtgcgattag .7560 

gcagctggcc agaatctttt ggatctcctg gacatatggc tgactagtcc tcccaagcct .7620 

tcccaacagg cctctttttt ttcctttttt tcttttcttt tttttctttc tttctttctt 7680 

tctttttttt ttttttttag gctagtgaag tgaaattgtg ggagtggaaa aggaacaaag 7740 

aaatcggtaa ctggtagtga tcaattactt gtaaacacta ttgtacttgg accagcccag 7800 

taggcctttt ttaaaactct gagttacctc tctttccttt ccttgagcag tgccattaat 7860 
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tctgtotctg 


gggcaatcct 


ttctgatgtt 


ctctggacct 


ggctctctct 


ccttaggaga 


7920 


ggccaggaga 


gtagccagag 


agcatgtcat 


ttgtagctga 


ggttaaagtg 


tggagctatc 


7980 


aatggtgacc 


tggcctcttg 


gcatgttagc 


aagccagagg 


accttgacaa 


cttttttgat 


8040 


gattgtccgt 


tcaccctgat 


caaaggtgtt 


tggcttagga 


ggagggaaga 


aaagctaccc 


8100 


ctattagtct 


tgatggcccc 


agcgtgggtc 


tctattgctt 


gacctggttc 


ctagcagcat 


8160 


tatcagaagg 


aaaatccacc 


gctcttaagg 


ctcctgggaa 


ctttcaggac 


ttcctttctc 


8220 


aggattgcaa 


acataagact 


atttgagctt 


tcacttttga 


aaagcggtta 


ctaataccta 


8280 


tactctggga 


aagggctaat 


gcagatagaa 


gactgtggtc 


actgcatcag 


gcaacagacc 


8340 


atttccgcta 


aatttagtga 


ctccaggaag 


gccagtgaag 


aaataacaca 


cgtagcaacc 


8400 


agagactgtg 


ttgtaatatg 


ttggctgaca 


gcagggtact 


ttctgtgatg 


ctgaaagcca 


8460 


cattcatttt 


ctctcccctc 


atccccatct 


aagcaagcct 


ggtagaatca 


taattacagt 


8520 


aataggtacc 


acttattgag 


tactctgtgc 


cagacaccct 


cctgagcata 


cgacatgcat 


8S80 


agcacattta 


atccttacaa 


tgacttaata 


aaatgtagta 


ctagtcttac 


ctacttcgag 


8640 


aatagggaaa 


tggaggttac 


ttgtttaaag 


tcacagagct 


aataggtagc 


atagctgaga 


8700 


tttgaactca 


ggcattctta 


ctccttgcct 


gcaagagtct 


cttggcattc 


ttgaatgcaa 


8760 


gcatatttct 


taacctcact 


gaggctcagt 


ttcctcttat 


ataatatggg 


gtaaagagcc 


8820 


ctcaccctgc 


ctgccacaca 


ctggtagt-gt 


cagataacat 


tgaagggtgt 


tagtttaaag 


8880 


gcttcatgga 


ctctataatg 


tcaacaaaag 


tgctgttaac 


tttcttctgg 


gtctcaggct . 


8940 


cctgatgtag 


agtcagtgga 


gcaaccctgc 


catctgctgt 


tatgctgttg 


atgttgctgc 


9000 


cacacttact 


aacctaaacc 


tttgattetg 


gctg-tggcct 


tctccagaag 


gtgtttactc 


9060 


atttgtccag 


tttatctttt 


aggaaacagc 


cagcccgtag 


atcattaagg 


ctggctattg 


9120 


gacagggggc 


tggggcctgc 


ctgacagagg 


aaggaagggc 


agacatctgg 


ttcttcctct 


9180 


gcccctacaa 


gagactccag 


cctgaccaca 


gagtggtact 


cctaggatgt 


agcagcagca 


9240 


tatgagcttg 


aatgtgcctt 


aatcctgctc 


tttactttga 


gaagagagaa 


ctaaggaccc 


9300 


acagatgttt 


cacagcttct 


ataggaggca 


gaggtagaaa 


aatggagaga 


gatgaggcca 


9360 


gagatagata 


actgatatta 


attaaacgtt 


gtattaagaa 


cctcacttag 


attatctgat 


9420 


tcaatcttca 


taataaccct 


gcaaccccca 


cctttttttg 


agaacagggt 


cttgctctgt 


9480 


tgtccaggct 


acagtgcact 


ggtacaatca 


tagttcactg 


cagtgtcaac 


ctcctgagct 


9540 


caagcaatcc 


tcccacctca 


gccttgcaag 


cagcttggac 


tacaggcgtg 


ccaccacacc 


9600 


ttgccatttt 


tttitatttt 


aagtagaaac 


aaggtcttat 


taatactatg 


ttgcccaggc 


9660 


tggtcttgaa 


ctccagcgat 


cctcetgccc 


cagcctccca 


aagtgcttgg 


gattacggaa 


9720 


gtaagccact 


gtgcctggcc 


agtgcaaccc 


ccattttata 


ctaaaacagg 


aaggcccaga 


9780 


aaggtttgga 


gtaacttgtc 


cagggtcaca 


cagatgatat 


ttgaactcag 


gtctccctgg 


9840 


ctcccaagag 


agtctgcttt 


ccactaggac 


tcccaggaga 


aaaaaaaaaa 


aaaaaacagt 


9900 


agacttggag 


acagaaaatc 


tgatttgagt 


cttagttgag 


ctaggctaac 


tgtgtaactg 


9960 


tgggcaagtt 


ccttagcccc 


tgtgagcctc 


agtttcttat 


ctgtaaaatg 


tcataaaaga 


10020 


aatccatctc 


atggagtagt 


tgtgatgatc 


aaggactctg 


aaaacattag 


aatggtttaa 


10080 


tgtgaaggat 


tagcagcagc 


acatggcaac 


attgtgcatc 


ttatattaac 


tatccaaata 


10140 


tatcaagcgt 


catttgctat 


atataaaagt 


catcaaatta 


ggcactgtgg 


gggatacgga 


10200 
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gttggcatac tagcctggcc tcttaattaa ttcattaatt agcttattta tttttgagat 10260 

aggtcttgct ctattgccca ggctggagtg cagtggcatg atgatagctt actatagcct 10320 

caatctccca ggcttaaaca atcctcctga gtagctggga ctacaggcac acactaccat 10380 

gcccagctaa ttttttttta attttttgta gagacagggt cttgctctgt tgcccaggct 10440 

ggtctcaaac tcctgggctc gagatcctcc cacctgggcc tcacaaagtg ttgggattac 10500 

aggtatgagc cacggcacct ggcctggtct cttaactggt tccctaagac agctggaaat 10560 

agagaatgtc atggagcatt cctaaccatg ggctccagcc tggctttcat tctgtttctc 10620 

ccctgaaaca acattccttt agtaatattc cgaataacag cttcatcagt ctgtctaccg 10680 

accactcttc aggcttcatc ttatatgacc tcccaaactg cactaagggt tgtattagag 10740 

aaaagtggat aaagttcgga gtcaggctgc ttgagcttaa atgccagctt cacttaccag 10 BOO 

ccacctgacc atgagtcagc tgcttaacca ttctttgcca cagtttcctt gtctatgaaa 10860 

agggaaatgg ctcccacctc aaaaagttgt taacattaaa ttcaatcatg tattcaaagt 10920 

cctgagcaga atgtctggcc atgactggga cttaacagat gttagcattt attattagta 10980 

tctgtcagtc ttgaaatgtt ctcttccctt ggctttcatg acattccaca ctctcctggt 11040 

tttctcttac ctctctggta atacctgttt gcttatcctt ctttgtccag ctctgggatg 11100 

ttaccattcc ttcaggcgtg ctgttttctc cttaggcagt cttacacaca ctcatgactt 11160 

ccttccattg tcctccacac actgatgacc ctaaaatcag tatctccagc ctaaaccttt 11220 
ccactgagtt ctagacccat atgttgtact atcaacctgg cttgtccatt tgaatgtctt. .11280 

ccaggcactt cagactctct tctctagact ttgctggact ttcactcttc cccctaaaac 11340 

tggctcctct tccactgaaa catgtatgtc attgagaggc accaccatcc acecagtgcc 11400 

taagccagaa acctaggaat ccttgatacc tgttctetct catcctgcat atccaagcct 11460 

atcagtttta tctctaaatt atattttggt aggtttactt ctttcctttt ctcccaccac 11520 

caccctgctc caagctacca tcatctcacc tggatgtctg caatagcctc atctcccaca 11580 

gccactctgc accccctaat ctgttctcta tagagcagtt ggaaggagtg atttttgttg 11640 

tttgttttgt tttgttttag acagagtctc actctgttcc ccaaggctgg agtgcagtgg 11700 

cacaatttcg gctcactgca acttctgcct cccgggttta agcaattctc ctgcctcagc 11760 

ctcccaagta gctgggatta aggcaccggc ccccataccc agctaatttt tatattttta 11820 

gtagagatgg ggttttgcca tgttggccaa gctagtctcg aactcctgac ctcaagtgat 11880 t 

ccacctgcct cggcctccca aagtgctggg attacaggtg tgagccactg cacctggctg 11940 

gaaggagtga tcttaaaaaa aaaaaaaaca aaaaaaaact tgactgtgtc actctgtgtt 12,000 

gtctctccta ccttgtaiac ttccacaact tcccagtgtt cttggataaa gaccaaaatc 12060 

cttaacttgg ccaggcgcgg tggctcacac ctatcatctc agcactttgg gaggccgagg 12120 

caggcagatc atgaagtcaa gagattgaga ccatcctggc caacatggtg aaaccccatc 12180 

tctactaaaa atacaaaaat tagctggtcg tggtggcgtg tgcctgtagt cccagctact 12240 

tgggaggctg aggcaggaga atcacttgaa cctgggaggc agaggttgca gtgagcccag 12300 

atcacgccac tgcactccag cctggtgaca gagtaagact ccatctcaaa aaaaaaaaaa 12360 

aaaaaaaaaa ttccttaatt tggcctacag tagagccctc cgtaatgtgg cctctctcca 12420 

catctccaca acctcctgct ccctgcactt cagcctcacc tctcttctgg acaggccctc 12480 

cttctgacaa gggctttgtt cattctgctc cctctgccta gaatgccccc ttactctgtt 12540 

cacttaactc ctgcttatcg tttagatctt tacctggatg gctcagagaa atatagaagt 12600 
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aattcctcac 


cctgaaaaat 


aggttaggtc 


cctgttttat 


gttttcatag 


acctttcctt 


12660 


tgaggctttt 


tttaaaaaag 


tagttttaat 


ctcacattta 


ttcatgtgat 


catctcctta 


12720 


atgatatctt 


aagacctcta 


a-tagaacaat 


ttggtcatgg 


actgtggggt 


ttttgcccct 


12780 


cattgtgtca 


gcactgagca 


tattgttggc 


ataggaggga 


tatttgttga 


atgaattget 


12840 


agaggtggcc 


aagagatatg 


a-tgtaagtca 


ggcttttccc 


tgcccttccc 


cttccccttc 


12900 


cccacatcct 


tcctatagca 


gccaccgtgg 


etgcagttac 


tgtaaatggc 


aagacggaat 


12960 


cagttccgga 


cattgggttg 


ttttagaaaa 


ttgcctgcaa 


gtgtcagggt 


gataagttaa 


13020 


agctttgtct 


tttgccctca 


gaggagctat 


cccatagtga 


gtagaageca 


gagaagctga 


13060 


ccccaggagt 


ccttctttcc 


agcagcaggt 


ettgagctge 


acttctctgt 


agctacaatc 


13140 


caggcaggaa 


caagccctag 


gtacctccgg 


agaggagggc 


aagagaggaa 


gaatgagttc 


13200 


a get act eta 


gccaccaaac 


tgattatgaa 


ttgccctgaa 


atctgaaaaa 


tttcaattcc 


13260 


aatcgtaagt 


ttgttttgtt 


tcattttgtt 


ttcttaaatt 


gtatatttga 


aagatggcat 


13320 


taactaaaga 


tatatattca 


atatagagtg 


gaaaaaatgg 


aatacttgea 


tagtatcttt 


13380 


tacttatagg 


tgatttatga 


tggggagtgg 


ggtggatagg 


ttggcagttc 


ccccaagaag 


13440 


ttggaaatga 


agtttgtcct 


ctgtgagttg 


aactaattag 


atccacaagt 


aatgaaagca 


13500 


gtattgtgtt 


gtagttaaga 


gcacactcta 


gaaccagatt 


gcttagtttc 


aaatcctggt 


13560 


tetgectttt 


attatctgtg 


toctttgggc 


aagttacttg 


ccctttgtgt 


gcttcatttt 


13620. 


tctcatctag 


aaaatggaga 


ggccaggcgt 


agtggctcat 


gectataate 


ccagcacttt 


13680 


gggaggcega 


ggegggcaga 


tcacctgagg 


tgagaagttc 


aagaccagcc 


tggecaacat 


13740 


ggtgaaaccc 


tgtctctaca 


aaaatacaaa 


aattagccag 


gcatgatggc 


gggtgcctgt 


13800 


aatcccagct 


acccaggagc 


ctgaggcggg 


agaaacactt 


gaacctggaa 


ggcagaggtt 


13860 


gtagtgagcc 


aggattgeae 


cactgcactc 


cagcctgggt 


gacaagagct 


agactcagtc 


13920 


taaaaaaaaa 


aaaaaaaaac 


aaactggaga 


tacaggctgg 


gtgeaggget 


tacacttata 


13980 


atatcagcac 


tttgggaggc 


etaggeggga 


ggattgcttg 


aactcaggag 


tttcaagatc 


14040 


agtctgggta 


acagagcaag 


acctcatccc 


cacaaaaaat 


caaaaattta 


gecaggcatg 


14100 


gtggctcatg 


cctgtggtcc 


cagctactca 


ggaggctgag 


gcgagaggat 


tgettgagee 


14160 


caggaggttg 


aggctgeagt 


gaaccatgac 


tgcaccacta 


catgccagcc 


tggatgacag 


14220 


agcaagaccc 


tatctcaaaa 


aaaaaaaaaa 


aaagaaacga 


gccaggcgcg 


tttgctcacg 


14280 


ccagtaatcc 


cagcactttg 


ggaggccaag 


gcaggtggat 


cacttgaggt 


caggagatcg 


14340 


agactagect 


ggccaacatg 


gtgaaacccc 


atctcaactg 


aaaatacaaa 


aattagccag 


14400 


gcatggtggc 


atgctcctgt 


agtcccagct 


actcacttgg 


aggctgaggc 


acgagaatcg 


14460 


cttgaaccca 


ggaggeggag 


gttgcagtgg 


gccaacatca 


tgtcactgea 


ctccagcctg 


14520 


ggagacagag 


cgagactctg 


tctcaataaa 


taaataaaca 


taaaataaaa 


taaaataaaa 


14580 


taaaataaaa 


taaaaaaata 


tggaggccag 


caggcaeggt 


ggctcacgca 


tgtaatccca 


14640 


gcactttggg 


aggecgaggg 


gggeggatea 


caaggtcagg 


agatcgagac 


catcctggct 


14700 


aacacagtga 


aaccgcgtct 


ctactaaaaa 


tacacaaaat 


tagecaggea 


tggtggcagg 


14760 


cacctgtagt 


ccctgctact 


caggaggctg 


aggcaggaga 


atggcgtgaa 


cccgggaggc 


14820 


ggagcttgea 


gtgagctgag 


atcgcgccac 


tgcagtccag 


cctgggcgac 


agagcaagac 


14880 


tctgtctcaa 


aaaaaaaaaa 


aaaaatggag 


gttgggcgcg 


gtggctcgcg 


cctgtaatcc 


14940 



US 6,340,583 Bl 
53 54 

-continued 

cagcactttg ggaggtcgag gcgggcggat cacctgaggt caggagttcc agaccagcct 15 000 
ggccaacatg gtgaaacctt gtctctacta aaattacaaa aattagccag gcacgatggc 15060 
. aggcacctgt aatcccagct acttaggaga ctaaggcagg agaatagctt gaacctggga 15120 
gatggaggtt gcagtgtgct gagatcgcgc cactgccctc cagtagagtg agattccgtc 15180 
tcaaaaaaaa aaaaaaagaa gaaatggaga tacaaactta ctacctacct ccttacaacc 15240 
taccctcaca gtattactgt gaataaaagt gtgtgtagca ctgggaacac tattcacaga 15300 
gcactcatga atgtttgttc tttgttatta gttactagag aggcaaatgt ctgccagggc 15360 
tgaataatat gtgtgaattg gtgattgtcg cacatatcta aagaagtagt tatttttttc 15420 
aattaaaact tagtttaaaa accaatataa ggccgagcgc agtggctcac acctgtaatc 15480 
ccagcacttt gggaggccga ggtgggcaga tcatttgagg tcaggagttc gagactagcc 15540 
tggccaacat ggtgaaaccc tgtctctgct aaaaaaaaaa aaaaagtaca aaaattagcc 15600 
aggcatgatg gcaggtccct gtaatcccag ctacttggga ggccgaggca ggagaattgc 15660 
ttgaacccag gaggtggagg ttgtagtgag ccgagtttgt gccactgcac ttcagcctgg 15720 
gtgacagagg gagacactgt ctcaaaaaaa aaaaaaaaaa accaaaacca atataataaa 15780 
taagtggcca gcaatgaaac agaaagtgaa aagttagtga agcaaaacta gtactgtatt 15840 
cagataaaga tgctgaatct agatttggtc accagaatag ggtcctttgt ggcaacctgg 15900 
gctagtttgg ctgactcacc actgccagga tgaaatttct ttcagtggct actcatttcc 15960 
ctttatttta agtccatgct cacagagcaa ccttctgatg cctaattcag cttcctggga 16020 . 
tacttaataa caggaagggt ctggaagtag tacctgtata ggggatatga gtgttctgat 16080 
tttaatagtc aattcataag tgtacagagg gtttgataaa tggttaggtc agaaccatca 16140 
cagaatgtct acacctcttt ggacattagg aaggtcaaaa acctgaaagg ccaaaagcta 16200 
ggcctagatt agggtcattc accaagaaaa cat cage ctt gaagagttct ctgggtggtc 16260 
caccagtcaa ccttcctttg atcacacctc cttcctcgtt gcttctttaa gcattgacct 16320 
gtaatgggta tggaattttt tgctcaccta actccttcct tttacagagg aagaagttga 16380 
ageccagaga gatttaatgg ettgectaag atcacacgca gattttctgt taaccagggt 16440 
gatttttcag gtgttccctg ccagacgagg gcttttttcc ttgaattgee tagagatttc 16500 
ttgagatatc cgaagcattt ttcccagtgc agectggaga aggatgtccc tgtcaacaca 16560 
gcatttgtta ctcaatgtta gacattcaat tttctaatta gtatcatgga gcaacagtgg 16620 
atgattatct ataaggggtt gcaattccat gcttatgtgc ttacagccca tatagacaaa 16680 
tatcagctgt taaaatgaca aggcagtaga gatgtggccc caggacaaag gcatactctg 16740 
ctgttagtga acactagttg gecagcaaat ttcacatggg catatacacg gccaactgta 16800 
gactttaggc atttataccc attcagagag ccaaactggc aactaaagat cagcattctc 16860 
tttggcattt cagctttgeg ttctgttaaa aatcactget tgcttaaata cctctgatag 16920 
ctcttcactg cctgtaggca actctttagc ctagcagact tggtctttag tgctctgccc 16980 
ctaetctctt ccaccattct ggcctcctgt etaattgetg cccatatgtg ecatgeaeta 17040 
gagcttacag acctgctcag cgttatatga gcataccata etctttatge etcagtgeat 17100 
ttgcacatgt tgttccttca ggccagaatg cctgttactg cctggcaatc age ct at tag 17160 
agtctgccaa taccatccca tcttctgtgg aggagccccc cgccaaatcc acccatacct 17220 
ctccccacca atcagagact tcttctctct ttgttattct ettegttatt ctcttcatac 17280 
ctcagttata tccatttcag tatttgttta cacatctagc atcactctta gagtgtgaaa 17340 
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ttctccaagt 


gtggagccgt 


atctagtttg 


tctttgtatc 


ccagagctta 


geaaagtgee 


17400 


tagaatgtag 


tgggtgctca 


gagtgtttgc 


tgggtgaatg 


atgtatttgt 


tgaacgactc • 


17460 


tttggacact 


tgaataaagt 


ccatccagta 


tgcaccatta 


ccatctcttc 


gctctacaat 


17520 


attcttttag 


gcaagagctt 


atcttttgag 


gtgataagat 


oagctcaaac 


ttatgtagac 


17580 


taagacctca 


gtctgtaaat 


gtcatcccta 


agtcttaaac 


catcaaaacc 


agggectcaa 


17640 


ggaatggcat 


gccttctgca 


actgtagcaa 


cctgctgtgc 


ttattttgcc 


gtgtttttca 


17700 


tttttccccc 


aaaagctaga 


gtcccttctc 


ccatgggcag 


tgctggaagt 


gtgctaacaa 


17760 


attctttctc 


catactgctt 


acgattacaa 


aaaaaaccct 


cagcatctca 


tgecagaett 


17B20 


gagttaaggt 


tgttttcttt 


tgtgtgtcag 


ctgtattctg 


gtcatgactt 


cctgatgatg 


17680 


ccctatagag 


attttgctga 


gatcagaggg 


tgctccactg 


ccatcagtag 


cactgactct 


17940 


tgcagaagca 


ccgtttctga 


agttggctaa 


tgtcatccct 


cacgtttgtt 


tgtttgaaat 


18000 


ttgttttagt 


tccagagata 


gcactttcat 


ggaatgacgc 


tatcttctag 


aatcactttt 


18060 


tttttttttt 


tgagttggag 


tctcgctgtg 


tcgccaggct 


ggagtgcagt 


ggcacaatct 


18120 


cagctcactg 


caatctccac 


cttccgggtt 


caagtgattc 


ccctgcctca 


gcctcccgag 


18180 


gagctgttac 


tacaggcgca 


cacccccact 


cctggctaat 


tttatgtgtt 


ttagtagaga 


18240 


cggggtttca 


ccgtgttggc 


caggatggtc 


tcgatctcct 


gactttgtga 


tctgcctgct 


18300 


tcagcctccc 


aaagtgctgg 


gattacaggt 


gtgagtcacc 


gcgcctggcc 


tagaatcacc 


18360 


tttttatacc 


ataacgtgag 


caccactgcc 


gcgtcaccaa 


ggaaagagag 


aggcagctac 


18420 


tgtggggtta 


caaatgggta 


agagtggcac 


caggaaggtg 


aaagtctcta 


ettagecaag 


18480 


gcrttaacaaa 


atgtcaatca 


ccaaacattt 


atttattaag 


ctacgttcag 


gataagaaga 


18540 


tgaacaagct 


atctgtacat 


tcattttctc 


gtttgtaaca 


aggtaatgat 


agtgatctat 


18600 


cctgcctgcc 


tctgagggtt 


attgtgagaa 


taaaatgaaa 


tcaagtggaa 


aagcacttag 


18660 


gaaaaagaaa 


agcattggtt 


ttcaattgtt 


agtgtggatc 


agaaacactg 


gggcttgttt 


18720 


aaaatgcaga 


ttcttagccc 


cagtctcagc 


gattctgatt 


ctgtatatct 


gaagtgggac 


18780 


tcaggaatct 


tgattttcaa 


caagctgacc 


agagggtcca 


at get get at 


tcctttagtt 


18840 


acactttcag 


aaatattact 


gtaaatcaaa 


tggcaagaat 


aaaatagtta 


tttgaggcag 


18900 


ttttagtatg 


ttggacctgg 


agtccaaaga 


cttgggtcaa 


actccagctt 


tgtcagttcc 


16960 


tagacctgtg 


accttaaaca 


gcaaccttct 


ctgtgaacct 


tagttccctc 


aggaaegget 


19020 


ctggtcacct 


cctgctgtac 


tccattgatg 


actcaccaca 


taaggctccc 


tgggagtccc 


19080 


ccaaaccttt 


gctctcttaa 


ctccttttac 


agcctcctac 


atctcctgca 


ggtgctgtct 


19140 


tctcctcctt 


tttccaggcc 


ctgctctgac 


acagcattca 


ttctcctctg 


ggaagggttc 


19200 


cttcaatgtg 


tctccaagca 


catcacaccc 


aggaaggacc 


ctgtggccat 


atctgtctat 


19260 


caccagatca 


eactacgtga 


aggcaggcac 


taggtactgt 


cagtgcccag 


cataggectg 


19320 


gcccatacca 


ggtgtccaca 


gatgcctagt 


aaagaaacct 


atgattcagg 


acccccatga 


19380 


tgagcaacto 


tagcactaga 


acagtgataa 


taactaatgt 


ttataatgea 


tcttcagttt 


19440 


acagagggct 


tttgtactca 


tcatctagtt 


tagttcctgc 


aacaacctct 


tgaggaatat 


19500 


agcacaagca 


ggacaaggga 


agcccagaga 


tgttaaataa 


tttatccaag 


tttatgctgc * 


19560 


tgggaagggc 


agcactgaaa 


ttaaaagaaa 


agttttctga 


gctcaaatcc 


catgcccttt 


19620 


cctcaatgtg 


agctctagca 


aggtattcag 


gaatcctgcc 


tctacagttc 


agagectcaa 


19680 
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attgctgggt atgttgagtt cttgtatctg atttttctag atttcctgcc cacattctta 19740 

ctgtctggat atcaggaaag agtttatcaa atgcctgtgg aaatccaaga taaggtctca 19800 

tgatgagtaa cccagtgaaa acatgaagtc aagtctaact agtcactact atttcactac 19B60 

tgctgactcc tgatgatcag ctccttttct aagtgcttac tgtccactta ttccatcatc 19920 

tgcctagaat ttatgtgaag gaatcaaagc aaaaggatca taaggcttcc tttttccagt 19980 

atgtttttcc tcctttttga aaactgggcc agttagctat ctccattttt atttcatgaa 20040 

tacatcccca gcgcctggta tatagtagat atggaacatt acactttgga gatattgcac 20100 

ccattctcca gtttctccaa agttactaac aatggttcca tcactgtgcc aacatatttt 20160 

cttttttcaa tatattggga aataattctc ccagtctgaa aatctgaaca catttcatgt 20220 

gacttggtat cctcatatgt cttgggcttc caattctcca ttcctagttt caagttcatg 20280 

aactgtaaaa caaaggatta gactaaatct ctaaagttct atccagatgc caaattcttt 20340 

tetctttcca tgatacctaa gatagatgcc aaatattgtc ttttacctgg tgtttgtgaa 20400 

catgacatca cattacagga gtagcagata ctaaactctc actctgtaaa acactgactg 20460 

agttccatga gccagatact gaagtgagct tgttcacata tgttctcatt taatgctcat 20520 

aaccctgtga agctgggaat tgctgggaca ttttatttat ttatttattg agacggagtc 20580 

tggctctgtc acctaggctg gtgtgcaatg gcatgatctt ggctcaccgc aacctccgcc 20640 

tcccgggttc aagcgattct cttgcctcag cctccgcagt agctgggatt acggggcaca 20700 

caccaccaca tccagcrtaat tttgtatttt tagcagagat ggagtttctc catgttggcc 20760 

aggttggtca cgaacacttg acctcaagtg atctgcctgc ctcagcctcc caaagtgctg 20820 

ggattacagg catgagccac catgcctgcc cgggaccctt gttttagaag gatgactgct 20880 

gctataatgt agaaagtgat ttggaagagg ggaggagtgg ggcacgaaag atggttagta 20940 

gatgggggtg gtaatgctta cctttcagta tttggaggct tcggagtcct caaaaattct 21000 

cttccttgat tggagtcctc ccagccaata gagggcttca cacaaacagt ttcttgggtt 21060 

ttgaattgtt tgaccagagc tttcttccga caaaaggttg gggtgattca ttcacttacc 21120 

acaccttgcc tgaacattca cttggggctg ccggttatga aggctattgt tctccagcct 21180 

gtcacagacg ctttgaagac ctgtgcctca gctggttcta aggagtcagt ttgttcagct 21240 

ccgtgccagg tttccaactt atgaaatgtg ctggagatta acacctctcc tgccatttta 21300 

tccctactat aattgccagt caaaggattc ctgcagttgc ctctggcagc cataactgat 21360 

gaatgttctg ccagctgctc tgaggaccta gaagagcagt tttctatcca ggaccagttt 21420 

ccaagggtgg gagggtgaaa tatatcctcc agtgtgacat ttcatctccc. agtgatgggt 21480 . 

ggcttgggcc ctttgaagtt ggctctgagg aaccacacac ttgggtctga gcagccagca 21540. 

gcttatcaca tctggtgatc aatccttcaa aggttcctcc tgaagtctga atttttggag 21600 

gtcaaatgga ttccacctgg gaggggcttc tgcttcaact caggacatgg ggagaaggct 21660 

gttcctcttc cagggggagg cagttttcat ggcattgaga tgtcctctca cttattcccc 21720 

acccacccac caagtccttt gtaagaggag tagggggaga ggagagcgcc tgcagcctcc 21780 

tgctcacatt cctagacacc gactcactga gcccgtcgcc gctggaacag cagagctgtg 21840 

tgaaatgtca agaggagtta tgctcatagg ctccctggcc tcagtctctt tgtggcttgc 21900 

atattcttcc attagtactg tgttcatcac atggaaatca gegggtacaa ttaaaagata 21960 

atttgctagt cccagactta atttggggcc cccttcttgc ctgattgaat tacaggggaa 22020 

cataatagat ttttggtgag aaatagttgt ctgtgtggct gggagaaaga ttgctcccag 22080 
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ctctccagct 


gggcagccct 


ttcagtatcc 


cgtatgttat 


ttccccactt 


ccagcccacc 


22140 


tcacctcctc 


tgtggccctt 


gtgtgtcccc 


tcggctagga 


tcctgacctc 


ctgctcaaga 


22200 


gtttaaactc 


aacttgagac 


ccaaggaaaa 


tagagagccc 


tctgcaacct 


cataggggtg 


22260 


aaaaatgttg 


atgctgggag 


ctatttagag 


acctaaccaa 


ggcccagaca 


gagagagtga 


22320 


cttgctaaag 


gccacatagc 


tagcccacag 


tagttgtaac 


aatagtctta 


atgatattaa 


22380 


tggctaacat 


ttatcaacct 


ttaatgtgtc 


ccagactttg 


tgccaagggc 


ttacatgcag 


22440 


tgcattgtcg 


cattcaaacc 


cagacagtct 


ggctctgggc 


ccaggctgag 


ctttggtata 


22500 


gcatggtaga 


acgttgtcta 


taatgtctag 


tctgggttca 


aatcctggct 


tcacttctca 


22560 


catttacagc 


tgagtgacct 


caggcaagtg 


atttaacctc 


cctgtacctc 


agttgcttta 


22620 


tctgtaaaga 


gaaaaatcac 


agcactgtgg 


aatagtgggg 


gttaaaattc 


attcatacaa 


22680 


gtagtgctgc 


aagcaatgtt 


taatacaggg 


tgagcacctg 


ttcagtgctt 


ccttcttctg 


22740 


gctgcctctg 


gggctagagt 


gtggtgtctt 


cgtggtatag 


atagatagat 


atggctgagc 


22800 


tctgcacaaa 


caccaagagc 


tgttcttcac 


tattagaggt 


agtaaacaga 


gtggttgagc 


22860 


tctgtggttc 


tagaacagag 


gccggcaagc 


tatggcccat 


tgcctatttt 


aatacggcct 


22920 


gtgattgatt 


gatttttttt 


ttctttttga 


gacagagttt 


cactcttgtt 


gcccaggctg 


22980 


gaatgcaatg 


gcacgaactc 


agctcaccgc 


aacctctgcc 


tcctgggttc 


aagcgattct 


23040 


cctgtctcag 


cctctcgagt 


agctgggatt 


acaggcatgt 


gccaccacgc 


ctggctaatt 


23100 


tttgtatttt 


tagtagagac 


agggtttctc 


catgttggtc 


aggctagtct 


cgaacttcca 


23160 


acctcaggtg 


atctgcccgc 


ctcagccttc 


caaagtgctg 


ggattacagg 


cgtgagccac 


23220 


catgactggc 


ctgattgact 


gattttttta 


gtagagatag 


ggtcttggtt 


tgttacccag 


23280 


gctggtctca 


aacttctggc 


ttcaagcagt 


cctccctcct 


tggcctctcg 


aatgctggga 


23340 


ttataggcat 


gagccactat 


gcctggccta 


tatgacctgt 


gatttttaat 


ggttagggga 


23400 


aaaaaagcaa 


aagaatgctt 


tgtgacatgt 


ggaaattaca 


tgaaactcaa 


atatcagtgt 


23460 


cccagcctgg 


gcaacaaagt 


gagaccctgt 


ctctacaaaa 


aataaaaaaa 


aataagccag 


23520 


ggccgggcgc 


agtggctcac 


acctataatc 


tcagcacttt 


gggaggccga 


ggcaagtgga 


23580 


tcacctgagg 


tcaggagttc 


aagaccagcc 


tgaccaatat 


ggtgaaaccc 


tgtctgtact 


23640 


aaaaacacaa 


aaattagccg 


agcatggtgg 


catgcgcctg 


tagtcccagc 


tacttgggag 


23700 


gctgagacaa 


gagaattgct 


tgaacctggg 


aggcggaggt 


tgcagtgagc 


caagatcgcg 


23760 


acactacact 


gcagcctggg 


caacagagcg 


agactccgac 


acacgcacgc 


acgcacacac 


23820 


acacacacac 


acacacacac 


acgctgggta 


tggtggccag 


cacgtgtggt 


cccaggatgc 


23880 


actggaggct 


taggtaggag 


gatcacttga 


gcttaggtgg 


ttgagactac 


aatgaaccat 


23940 


gtttatacca 


ctgcacttta 


gccagggcaa 


cagtgtgaga 


ctgaatctca 


aaagaaaaaa 


24000 


aaaaaaaaga 


aaaaaatctt 


tccataagta 


aatatctgtt 


ggaacatagc 


catgtccctt 


24060 


agtttatgtt 


ttatatatgg 


ctgcttttgc 


cctataatga 


cacaattgag 


tggccacgac 


24120 


agtctgtatg 


gcctgcagag 


cctaagatat 


ttgctctctg 


gccctttaca 


gaaaaagtgc 


24180 


cttgacctgt 


gctctagagc 


catatgtacc 


aggtttgaaa 


ctcagcctca 


cagctgggtg 


24240 


tgatggcacg 


catctgtagt 


cccagctact 


ctggaggctg 


aggtgagagg 


atcacttgag 


24300 


tccagaaggt 


cgaggtcaag 


attgtagtga 


gccatgatgg 


catcaccgca 


ctccagcctg 


24360 


agtgacagag 


agagaccctg 


actcaaaaaa 


aaaaaaacaa 


aaaaaaaaaa 


caccctcacc 


24420 
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acttatcagc tatttgtctt gagaatagtg acataacccc tcagaaccta tttcctaatc 24480 
tgttaaatga ggctgatgac gtttcctcct tttactggca etttaaacat gatggataat 24540 
oaatgctaag cacttaacac agggcctaga agatattaac tgctcaataa atggtagctt 24 600 

cttaacagta ttcaaaccca tgtgctctta tcacatgcat tgttgtccct gtgtccagtt 24660 

ggtggaatgg gaaaaggctc ccttgtaacc ccatctacca tcrtttatcag actttcctgc 24720 

catggttcac agtaagagat agaagctgca cggtgacttc tggctcttta coatggtgag 24780 

cggtgtgtgc ctggtaaggg agagctgatg tcactgcccc aaatccagta gtgagatctg 24840 

agtgttctgg tttcctccag cagccttgct ttttccttta caatcctgca ggcagggaga 24 900 

caagggcttt ctacatggta ggctctggtt tggtcatcgt cacaactggg ggctgttcag 24960 

gtgggctccc attccagata cctaggctta tcaatccctt ttggcacccc aggccttttt 25020 

ctccctcatg ccccattttt cagtttgaaa agcatggtta tcacaggaca agtagaagaa 25080 

gctccactgt ccactgaggc caatggatgg tgttctgcat gtgaacactc agtgaatagt 25140 

gagtgaatga gagtaacctg ggctccatcc tatttgcaga gagctttgga aaagattttt 25200 

ctccttaaag agccagaatg aagcctggta gtgggagagc tccagctcta gagtcacatg 25260 

agcctacatt taaattccag ccctgccact gactcccttt ttgaccttga gtgagttacc 25320 

taatctctct gtacctcact tttcttgtct gtagagtggg aataattcct gtctcagaga 25380 

aataaaagag tgcatatagt gtttgccaca tggagacaca tcaggtgtag gttaatactc 25440 

tgggccttgt ttccttattt gcaacacagc cctgccctgg agtggaagtg gcacctccca 25500 

ttggtcagct cttgaggctg tccccaggac. aggcagaggg agggaatgaa tgggagccct .25560 

agtgccagga cagaacagat ggcagctcag agctaggatg gctctctgga cctgtctctc 25620 

ctaccagagg tccccccgtc tggtgtggct cttcctggac ctggcatcct ctgctttttt 25680 

tttttttcca cctccaagca gaattactgt cctgtaggca gctcctctgc ttgaggacat 25740 

ctggggccag atatgttcac actctatcct gccttgccet tccctgagct caggatggac 25800 

gctcaattgg tcccagttat tgtctgcagc gcctgcctgc agcctcgatc cagcccagct 25860 

ccaccccttg cctgcaaggt ctgtttccta acagctgctc caaccacaca cctcggttct 25920 

gcgggagccc ctcctcttcc tccctccctc cctcattcag gggtgggact gaagaagaag 25980 

gctaacttga cagcagcgct tctttcttag ctagtcaccg gcccctgctc aagaatgcca 26040 

gtgtgtgtgt agcctccaca gagaggtcgt tttctcggag tccagagggg ccgcctgagc 26100 

ttctgagaac tagggaggag ccatcccagc catgagcccc tgtgggaatc tgctgggggc 26160 

caagtggcct ggagtcctca ggctcccgca gctgctccgg agggagaggt gagctcaggg 26220 

cagcctgcct gcagccagag gtgccgggag ccccgggcct gtcatggtgg ccatctacag .26280. . 

ccggcctgag gcagtcacag acggatttgc agctgagcct gtctatctgg tgtgggaaga 26340 

agatggggag ttacttgtca gtcccggctt acttcacctc cagagacctg tttcggtgag 26400 

ttggtctccg agttcccctc tccatctctc ctggcccctg gtcctgagag gagggtggtc 26460 

tccctaaatc tccttctcac ttagtccttt accatcggtt ctgccgggca gaagccagcg 26520 

gaggttatac ccaaggagaa tcggccttgt gaggtacccc cattatgtcc tggaagtggt 26580 

gaggggaggg atatacccag aaggaacttc ttagggagct ccagctcccc ttctatccca 26640 

gacaaacctg aaggagcctc caaaagatgc cactgacctg cccattgtag atgttactgc 26700 

ttccgggggg aatagcccaa atagagtgct gtttccagct ctcacatgtc ttacctgcgg 26760 

gccatgctgc ctgcccagga atttgtccca acaagcagga tgggcaggtt ttgccaaact 26B20 
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gtggaaactg 


gcaagtcctg 


ggtgtgggta 


gcctggtaca 


cagtaggcac cttataaacg 


26680 


tttgttctct 


taatggcagg 


cacatttgcc 


tctggccttg 


aagggcttct gagctcccag 


26940 


gtgaatgtag 


ttgctgggga 


aagacctggg 


cgagtgcttc 


taagactgga gcaatgggct 


27000 


ttagagtgtt 


cctgagctgc 


tgggccagcc 


cccacacctc 


ctcagtccct aggcctaagt 


27060 


acctccacga 


gcctctctct 


gtggggcttc 


tcagagggag 


atgtggaaac tctacctcta 


27120 


acctggcttt 


ctttgctcat 


tgccccactc 


cacctcccat 


agaaactccc cagggggttt 


27180 


ctggccctct 


gggtcccttc 


tgaatggagc 


cattccaggc 


tagggtgggg tttgttttca 


27240 


ttctttggga 


gcagcctgtt 


gttccaaaaa 


ggctgcctcc 


ccctcaccag tggtcctggt 


27300 


cgacttttcc 


cttctggctt 


ctctaagcta 


ggtccagtgc 


ccagatcttg ctgccgggat 


27360 


actagtcagg 


tggccaggcc 


ctgggcagaa 


aagcagtgta 


ccatgtggtt ttgtggaatg 


27420 


accggaccct 


ggtagattgc 


tgggaagtgt 


ctggacaggg 


ggaaggggga agggaactgg 


27480 


tcctcaatgc 


tgactctacc 


aagcgccctg 


ctagacactt 


tatcctttaa tctctcaaca 


27540 


gcctaaagag 


attatatatc 


cccattttac 


agatgaggca 


accagtttca acagagttaa 


27600 


catatggagc 


ctcactgggc 


agctttttct 


gtcttcctga 


ctttctctca tccttcaggg 


27660 


ggctgcaggt 


ttgttttctt 


ctcctagtgg 


agaggaaatt 


ctcaggtttg ttttcctctc 


27720 


c tag ca gaga 


gtaaaaaaag 


ggatagtttg 


cctgacttgt 


tgaaggtgtg gctgagattg 


27780 


ttttctaaag 


agccaatgga 


aattgatctt 


gagtttagga 


gaaagctttt acatgtggaa 


27840 


ttaagotgcc 


aagtgttgaa 


gtagccacat 


ttcaggtcct 


cattaatttc tcttaatcct 


27900 


gggaaggcag 


cttaggagaa 


gggttgttcc 


tttaggagcc 


aggaactata ccccttttac 


27960" 


ccttggagag 


gcagggaagc 


cagggaggac 


acaacttctc 


aggaagagga gaagctagag 


28020 


cagatagtga 


actctcaacc 


tgaaccttta 


agggccagac 


cactaatgcc acccaagtcc 


28080 


acctgccgtt 


tgtcttgttc 


tgtcccaggc 


tttctggaga 


acctgatctt cttgccccta 


28140 


cccccaagct 


ccgtttgccc 


agctagagtc 


tggggggtac 


tgactgactt tcgtagacat 


28200 


tcttcccttc 


cccaaataag 


aggccacatt 


cctgaagtca 


cttctgaaga gatagctgcc 


28260 


acacagggct 


ctttcccccc 


agggagggac 


cacccagacc 


ctctgctctc ccaggtatcc 


28320 


gttaccacat 


cactacctgg 


tcagaaagct 


gtttctgcca 


ttagcccctc cctcttttat 


28380 


tataggatat 


cctcaagggc 


tectctttgg 


gcctcagttt 


catccttggc agaaagtaga 


28440 


agctagactt 


cttgggctcc 


tgaacagggt 


ccttgctgga 


ttctgtgaaa caaattaagt 


28500 


tcttgaccct 


aggcctctgg 


gggagtacaa 


agtctatggg 


agttctgggg ctgtggttgc 


28560 


aaggaaagtg 


acgcaaccag 


attccatggg 


gacatgatca 


ggcgtgacat gtgagggagg 


28620 


aagagggagc 


aagggaatga 


agaatacaac 


ttctgtgtcc 


catacacccc tgcctgacag 


28680 


. gccatacata 


ctcagcagag 


aatgcactgt 


ctttcctacc 


acactagcgt gaggagtgag 


28740 


ctgcaattac 


cactgtgctt 


ccaagtaaga 


aaatacctca 


aattggaatt tacaaaagag 


28800 


gtaaattagg 


gagtggcttt 


tgtcggacat 


ctttaaagca 


tttttctttt tatagaattt 


28860 


cacttaatgt 


ccaatactga 


tttaatgagc 


ttgggtttac 


acattatctc ttgaagaaaa 


28920 


caaatgaacc 


tttgtgttcc 


aaagcaatcc 


atgtttaaag 


ggaaaaaatt atgcataact 


28980 


ctgcccagct 


tcacagtaac 


ctttggcagg 


tgccttaggt 


cctctgggac tcttttcctt 


29040 


atctgaaaaa 


tgaaggactt 


ggatcaggtg 


aatggttccc 


agctctgcaa cttatgtggc 


29100 


tcctcagagg 


cacacaagct 


cttttccatt 


atttgccaaa 


taatggaggc cctgtcttta 


29160 
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actgcagtac aactacacaa aatacttgaa actacagtct tcctggtttt tggttggaac 29220 
. tgaatcagtg cactctagca acacttattt cttgctgttc gtaggcttca ttatgtgttt 29280 
ggitaatttt ttaaaacaac aataacatat tccataataa ttacagctta attggcagac 29340 
tgtttcagtc tataggatct gcaggaagga ggagtaataa agggattttt gactgagctc 29400 
ttatggaaca gagtctctct aggcccctgt catatctgcc cttctgggcc ctggggaaaa 29460 
gttggcatcc ccagttgtgg tgctctccag gtgccctcag gctgtggtgg agggagcttc 29520 
ccattctctc cttcagccca ctcaattcag aggctagggg ctgaaagaag cttctctaca 29560 
actggctgtt cactgggagg ttaagggatg accatccagc caggccttcc tcaggacatg 29640 
ggagggctta tgctttaaca tgtgtaaatc cactgcaata atgactggtt cttttacccc 29700 
ataaggttga gaatttacct gtaaacattt ttgtctgaag aatttggatg taagtgaggg 29760 
ctgggcctct atcttatctc acttggcttc tctcagcaca gcaccttgcc tgcttgttct 29820 
tacacatcct agatgcacag taactatttc ctaattatta gaaatctatt agaatcaatt 29880 
gatttcagct gggcttggtg gctccttcct gtaatcccag cactttggga ggctaaggct 29940 
ggaggatcac ctgagtccag gagtttaaga ccagcctggg caacataggg agaccctgtc 30000 
tctacaaaaa ataaaaaatt agccaggcat ggtggtgtgc acctgtagtc ccagctactc 30060 
aggaggctga ggcaggagga tctcttgagc ctgggaggtc agactacagt gagcaatgat 30120 
tgtgccactg cactccagcc tgggtgacag agtaagaetc tgtctcttaa aaaaaaaaaa 30180 
aaaaaagttg atttctattt ggatagataa ataattcatt ttaggacctt tctttttcac 30240 . 
ttacagaaat ctgtttcatt ctgggctgag aagcaggtcc atattgctag gcataggaga 30300 
aaaaggggtc tgtctgcatt tgeccttggt ggtctcaaat tggggaggga aagaaatgaa 30360 
cacttactgg ctaccttctg tgagccaggc atcatgcaag acatctgtac ataatttaat 30420 
tctcataacc ccataagata ttattagcaa tgtacaagtg aggaaactga ggctcagagt 30480 
catgaagtaa ctggccttgg gtgacacaga tggtaaatgg cagagaagga atatggatcc 30540 
aggtcttgaa agagaaaatc tcaactgatt atctttttta aaaaactcat atgttctctg 30600 
ctgactcaaa aggtctctgt gtggatctgg gttgacccac tgaactgacc atcagggttc 30660 
catgcacttt gtatctgccc aagccctcag aacccctcag taatgttttg gaagatgag* 30720 
tttggaggtt gtccttaggc atagcctcag cgtatgtagg cctctaggtg atctccccta 30780 
acctgaggat ttcagctcaa ttcactctgg ctcctcagga cagtgggatg actggttcag 30840 
acctcagctt taccacctcc cagctgggta ctcttctacc tacagccagg gcagattttg 30900 
actttcactt gaaacttcca aaaattgaaa ggtagaaaaa cagccttggc tttgggaaga 30960 
acgtatgatg tccatggcct ctaagcatct gaggtgggac atgttcgagt agcaccttac 31020 
agttccaaag tgtgttctgg gttctttgtt taaaagaaca gagactgctg gggaattgaa 31080 
cactgtgaag tatatgaagg aggagaattg tgctatttaa cattcagtac ttgggctaaa 31140 
ggagaagcat cacgaagtgt taacactcaa agggtcttga gctgtcaggg etc cage ttc 31200 
cttattttca caggtgagaa tcctgaggct cagctgttga gatgtgctgt ctcactccgg 31260 
tgacatagta cagtggatgt ggctttgcag ccaagcacac atagcttcac attccagctc 31320 
catcaattat gtattgggca getttgeaga atgatttgac tttaactctg cttttcagtc 31380 
ttctgtaaaa cagggataat cctgctaccg tagggttgtc aggattagag ataatataaa 31440 
taaggtacct catataggac ctggattatg gctggcattc aataaatagt agctgttaat 31500 
tgatagctaa gctagaactc tgaagtctac catggcaact tcttaagtgg tctgagaacc 31560 
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cagttgtgtt ctgtggcaaa acacagctta gggatccata cccagccctc ctgtcagctg 31620 

ttcaccttcc agttcttcag agacatgtgt ggcagtgact ttggccacat ogctggctgt 31680 

gccctttaaa ggcattcctt gacacagata tgtggactgg tgacgttgct ctccagccag 31740 

gtgttcttcc cagcaggctg gcctggctgt ctcctgcatg cetgtacttg tttgtctccc 31800 

tgctccctct cctgggcctg gccagagcta cttgcagcaa acaaaagcag gatattggca 31860 

atggaaagga gggtgtgttc tggtgctccc atgccctgcg gcgcacatac cattgcaagg 31920 

gcgtaacaga gcccaggcct gcat-ttgggt gcaaataagt ctgcacacag aagaaaagaa 31980 

ggacctggtg accaggagcc atggaaccct tgtgctcccc tacctgggct actggttctt 32040 

gccactccta ccattttcag tttggaaata tttgttaagg ctttgctctt ccaggtcctt 32100 

tgcttggtgc tgagtctacc aagagtaagt gggatgctgt ttttgtcctc agggagctaa 32160 

cagtctagtg aagaagaaag atggttgccc aggaacttct aagtcagaag gcaggaggca 32220 

agaaggaagc ccctgctcct actgccagcc ctctgttggg caccccatag ttcttcagaa 32280 

ccacatttaa tcctcactgc aggccaggca tagtggctca cacctgtaat cgcagcactt 32340 

cgggaggcca aggcgggcag atcacttgag gtcgggagtt cgagaccagc ctcaccaaca 32400 

tggggaaacc ccgtctctac taaaaataga aaaattagcc gggtgtggtg gcatgcgcca 32460 

gtaatcccag ctactcagga ggctgaggtg ggaaaatcac ttgaactcgg gaagcagagg 32520 

ttgcagtgag ccgagattgt gccactgcac tccagcctgg gcgataagag caaaattcca 32580 

tctcaaaaaa aaaaagaaaa aagaaaaaat cctcactgct accttgaaag taggtgatga 32640 

cattgccatt tcacaaatga gaagtgaagg ggctagccca agatcactta ggtggtaaat 327O0 

ggtggtgeta agattagaac ctcagatcat ctagggaaaa acacagatat gcacagagtt 32760 

aaggggaccc agggtattgt ttgtcctctt gtttcacagg tggggaaaca acccagagag 32820 

ggaaaggggc ttgtccaagg caatttagca cccaagaact tgaacccata tctctctcct 32880 

cctcatttag agctcatccc acatgtatct tatattgaga ggagtgtgag ccacatacca 32940 

agaacagtct tcccctctgc ctccaacctc actgtgcagt tttgagacac ttcacagcca 33000 

tactcttcat gccataccca gcccttaaga ccctgaagtt ccccttccat aagacaagta 33060 

ggaaaagcta tagggtaaaa atagccatca gtgtttgttg agcacccagg aggaattggg 33120 

cactccagaa agataaaggg attctcaggg acttgcttct ctagacttcc ctagctcagc 33180 

tgcttcaact cattcctgcc cctcttctct acctcccgca gtgctcagaa gtagtagaac 33240 

tcactgtggc ctctcacctt gcattgttga gttttattta gactttctct tcctcaactc 33300 

ttcataagct catgaaaggt gaagtagggt gccctgtgta tttatctttt atatc^gcag 33360 

-tgcttagcaa gttataataa tgcacttgcc tggbaaaagg ctttctctca tacattagct 33420 

tatttcctct tcacattggc tctttgtagt aataggatgc tattagttat tttcaatgag 33480 

agaaagctac taagagaagt tgtccagcta gtgacagtaa gtggctgata aagtgagctg 33540 

ccattacatt gtcatcatct ttaatagaag ttaacacata ctgagtttct actatattgg 33600 

gtcttttttt tttttttttt ttttttttta gagacggaat cttgctctgt tgtccaggct 33660 

ggaacgcagt ggtgcaattt tgggtcacca caacctccgc ttcccaggtt eaagcgattc 33720 
tcctgcctca gcctcctgag tagctgggac taccagtgca cgccaccacg cccggctaat .33780 

ttttgtattt ttagtagaga cagggtttca ccatgttggc caggctggtc ttgaartcct 33840 

gaccttgtga tctgcccgcc tcagcctccc aaagtgctgg gattacaggt gtgagccacc 33900 
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gcgccctgcc 


tatattagga 


cttttatata 


agctatctct 


agctagctag 


ctagctagct 


33960 


ataatgtttt 


ttgagacaga 


gtctgactct 


gtcacccagg 


ctggagtgca 


gtggcgtgat 


34020 


ctcgactcac 


tgcaacctcc 


acctcctggg 


ttccagtgat 


tctcctgcct 


cagcctcccg 


34080 


agrtagctggg 


attataggtg 


catgccacca 


cgcccagcta 


attttttgta 


tttttagtag 


34140 


accaggtttc 


accatgttgg 


ccaggctggt 


ctcgaactcc 


tgacttcaag 


tgatccaccc 


34200 


gcctcggcct 


cccaaagtgc 


tgggattata 


agcataagcc 


actgtgccca 


gctgctctct 


34260 


atatttttaa 


tacatattat 


ttccattaat 


tttcacagca 


gttcatttta 


tagatgagga 


34320 


aactaggcca 


gagaagtaaa 


atatcttgcc 


caagatgatg 


taactagtaa 


gtggcaggat 


34380 


caagattcaa 


accaagcaat 


gttcaaacct 


cttggaagca 


agaatgtggc 


cactgtggaa 


34440 


ggtgcaaggc 


cttgacaaca 


agaataggga 


aaagaaggaa 


ctagaaggaa 


agagatggca 


34500 


tgggctcagc 


aggccaggga 


gctcttagct 


gtgtgtgttg 


ggaagctcag 


aagggaggaa 


34S60 


gaggttgtct 


gtgcaggtaa 


gtcctgagaa 


cacaccagac 


ttttgagagg 


tggagcttca 


34620 


tagccaggtc 


attaggggag 


aagggagcta 


tagatttttt 


tttttttttt 


tttttttttt 


34680 


ttttttttag 


agacggggtc 


ttactatgtt 


gcccaggctg 


gtcttgaact 


cctgggctca 


34740 


agtgatcctc 


ccacctcagc 


ctcccaaagt 


gctgggatta 


gaggcatcag 


ccaccccgcc 


34800 


cagcgagcta 


-tggatctaac 


atgtacatct 


tacacagtgc 


taatagaatg 


ttgggtttct 


34860 


tccccaatat 


tttattttga 


aaaaaaattc 


aaatatatag 


aaaagttgaa 


aaatgtagtt 


34920 


caaagaacac 


ctacatacct 


ttcacataga 


ttcatgattt 


gttaatgtta 


tgccactttg 


34980 


tatatatctc 


tctccctcct 


atctgtatac 


ttttatttat 


ttatttttgc 


tgaactattt 


35040 


cagagtaact 


taaaggcatc 


ttgattttac 


ccttgaacag 


ttcaatatgt 


ttctgctaag 


35100 


aattctccta 


tataagtcag 


atatcattac 


atctaagaaa 


attcacggca 


attttacaat 


35160 


ataatattat 


agtccaaatc 


catatttcct 


cagttgttcc 


aaaaaatgtt 


catggctgtt 


35220 


tcctttttta 


atctaaattt 


gaatccaagt 


ttgaggcatt 


gtatttggtt 


gctgtgtctc 


35280 


tagggttttt 


aaaatctgtg 


ccttttcttc 


tccccatgac 


tttttagaag 


agtcaagacc 


35340 


ggttattctt 


atagaataac 


ccacattcta 


gatttgcctg 


attagttttt 


ttatacttaa 


35400 


cgtatttttg 


gcaagaacat 


tacattggta 


acgctgttgg 


tgatgggtca 


gttttgaaga 


35460 


gtggagatga 


ttaaactgct 


tttgttcatt 


gaagtatctg 


tcaagaccag 


agatccttaa 


35520 


ctggtgccat 


eaataggttt 


cagagaatcc 


tttatatata 


caccctgtcc 


cccacctaaa 


35580 


ttatatacac 


atcttcttta 


tatattcatt 


tttctagggg 


aggcttcttg 


gcttttatca 




aattctcaga 


gggccccaag 


acccaaagag 


gttatgaaac 


actagtctgt 


ccactgaggc 


JO /uu 


aggcaacaca 


gagctggttt 


ctggggcctt 


gttcagtctg. 


aaccagcttc 


ccttggggag 


35760 


otagcacaag 


gctgtaactt 


tgccccatct 


tggctttgga 


tcaaagagga 


ctgtccatd 


35820 


tgttgtcata 


cctaggaacc 


agggacagct 


tatgtggcct 


ggttccaggg 


atccaggaga 


35880 


atttcagttc 


ttgtcttgcc 


tttcaggtgt 


tcagaatgcc 


aggattccct 


caccaactgg 


35940 


tact at gaga 


aggatgggaa 


gctctactgc 


cccaaggact 


actgggggaa 


gtttggggag 


36000 


ttctgtcatg 


ggtgctccct 


gctgatgaca 


gggcctttta 


tggtgagtga 


atcccttcat 


36060 


atctgcccct 


cttggtcttc 


agagtccatt 


gacagtgctt 


ccagttccct 


gtggcctgtt 


36120 


aatcttttag 


tctttccatc 


agccagggca 


tctcccttta 


tttattcatt 


cattcaacta 


36180 


gcaggtatca 


attgagcacc 


tactaagtga 


aaggtaagat 


ccttccctca 


aagacttaat 


36240 


agttgaacgt 


tgggagtggg 


aggagaggca 


ggcagagagg 


agacacaata 


tagttggata 


36300 
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aggacctcca aggagagtgt tacaggctga gaggaggata tacttaggtt gtctttaggg 36360 

aatcagaaaa ggagactctg gaataggctg gcagagagag gggctacctc ctatacctgc 36420 

tctggacaaa cgactttaag catagtgaca gatttgccaa ccctgtattg gaagaactga 36480 

tcttttttag tggggatgat tacttctggg gatttcttct cataactgag accaaaacag 36540 

ttttgtgcag tctcagaaat gacaggaggt accaatctga cacttccttt ggaagctcta 36600 

gggcagagag tgaaagagtg gattttgacg ggggccttgc ttggaggtca ttcacccacc 36660 

cctgtcctca ctccagcoac agtgataact cacttccttc ct'ccctttgt acacccttct 36720 

ccccacctgc tcacaggtgg ctggggagtt caagtaccac ccagagtgct ttgcctgtat 36780 

gagctgcaag gtgatcattg aggatgggga tgcatatgca ctggtgcagc atgccaccct 36840 

ctactggtaa gatagtggtc ctttgtctat cctctcccat ataagagtgg ctggcgggga 36900 

gggacagtgg cagggtgagt tgggcagaag gagtgttagg gtagtcagag cattggattc 36960 

ttaccacagc agtgctctta accagctctt taacttgtaa gcagaatgat ttacacatgt 37020 

ctctaccctt tttccttacc aaecttgaaa atgtcttcac tctgccctgc aatcctccca 37080 

gtgggaggca ctcttcaagg acgatcccag aacattaaag tcaaagaccc cttagagctc 37140 

accctgtcca accaccttgg ttgataaaag aagtcagcct ggggcccatg gaatagaata 37200 

gtacaagggc aaggttctca ttgtgagtca aaggtagagt gaagagaacc cagaccatct 37260 

caccccaacc caggccagtg tttttccaaa tataccactt gctgcagatc tagctcagca 37320 

cccccagtcc cagcccaccc tgagaaccca ggctcctcat tctgagcagc cagctagaat 37380 

catgacaaag agggtggtag tgagactatg ggtactgttgettaaagcca catggtgcag 37440 

tgyttgctgg ggggcttctg tgtgggactc tagcatctta ttcccccctg tgccctctcc 37500 

ccagtgggaa gtgccacaat gaggtggtgc tggcacccat gtttgagaga ctctccacag 37560 

agtctgttca ggagcagctg ccctactctg tcacgctcat ctccatgccg gccaecactg 37620 

aaggcaggcg gggcttctcc gtgtccgtgg agagtgcctg ctccaactac gccaecactg 37680 

tgcaagtgaa agagtaagta ttttgagaac ccttcagcag gggttcttga gcagagtctg 37740 

taaatgggcc tcagagggct tagacctcca aagtctcatg cagaactccc tttattctca 37800 

tctcatatct ttctcctgga ccccactatg ctgtaaccgt acctgggcct tggcacttac 37860 

tgttctctct gcccaggcta ottcctaccc gatacttaag gcaagaatca ctcacctttc 37920 

aggtgtcagg tttcaggtca tgtttgctet ttgaaatcat ctggcttgat tatgtgtatt 37980 

agttgtttat cttctatccc ctccactaga atgtaaattc cagaagaaac ttgctgtctt 38040 

attcagtget gcatgcccag ggcttggaag agtacctggc atatagtagg agttgattga . 38100 

ttattatttt gtcagtcgag agaatgaatg gagaaaatgt ggtccatggc ccaaaagaag 38160 

ttaagaccct atcctagatt caggecagag accagatgga gaaagagtct gtgtctatct 38220 

aataccagta atgtcgtacc tctggccgct taccatgtaa atattgattg tgtatctacc 38280 

atgtgttgga cactaggcta gtgettgeae agcaggtgaa agatactaga gtttgggaag 38340 

tcaggaggag ctaaggtctg ttctacaacc ttattagatg aagaggagag ggaattgtgt 38400 

tcagggcaga gggagaagca tttctccaaa agtaggagtc ttaatcatgt ctgatgtagg 38460 

ttgagtgtgg ccagaaaagg ggctgttaag tatagagggc ctggattatg aaaatccagc 38520 

agatccattg agagtttaag cagcaaggtg ttgtgaccaa gttaacattt tagaaggatc 38580 

actggtatgg aggttggatt ggagagggga aagcctaaag gtatagagac tagttaggaa 38640 
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gctattgtag gctgggcatg gtggttcatg cctgtaatct cagcactttg ggaggctgag 38700 
gtgggaggat tgcttgaggc caggagttga agaccaacct ggccaacata gcaagacccc .38760 
gtctctgttt ttcttaatta aaagaaaagt ccagacgtag acatagtggc tcacgcctgt 38820 
aatgccagca ctttgggagg ccaaggtggg cagattgctt gaggtcaaga gtttgggatt 38880 
aggccaggcg cagtggctca cgcctgtaat cccagcactt tgggaggccg aggtgggcgg 38940 
atcacaaggt caggagatca agaccatcct ggctaacaca atgaaacccc gtctctacta 39000 
aaagtacaaa aattagccgg gcatggtggc ggacgcctgt agtcccagct actcgggagg 39060 
ctgaggcagg agaatggcgt gaacctagga ggcggagctt gctgtgagca gagatcacgc 39120 
cactgcactc cagcctgagc gacagagcga gactccatct caaaaaaaaa aaagagtttg 39180 
ggattagcct ggccaacatg gcaaaacccc atctctacaa aaagtacaaa aaaattagct 39240 
gggtatggtg gtgcgcgcct gtaatcccag ttactcagga ggctgaggca tgagaattgc 39300 
ttgagcctgg gaggtggagg ttgcagtgag cccagatcat gccactgcac tccagcctgg 39360 
atgacagagt aagatgccat ctcaaataaa aattaaaaac aaagtttaaa aaaaaaatag 39420 
aagctattac cgtgatccag gtaagagatg tgaataacta caatgatgga aagaaggcag 39480 
agttcttaga gatgggagta ggagagatga gggaactcca gattgggaag atgatgttca 39540 
agtttctggc ttaggccaca gggtgagtgg caattccctt cactgagatg gggcatcctg 39600 
gaaaaggtgt tgcctttctg tgtgggtatc ctgggcccct taggggccac tggtggcctg 39660 
ggacctggta aaccttccct gcacaagcag aattggtcaa gcaggttttt aggacatctt 39720 
taccctgcct caactcttgt ctggcccagg gtcaaccgga tgcacatcag tcccaacaat 39780 
cgaaacgcca tccaccctgg ggaccgcatc ctggagatca atgggacccc cgtccgcaca 39840 
cttcgagtgg aggaggtaga gtgtgtgtet aatctgtctt gtgagggtgg gacatggaac 39900 
agatcctctg ggaaatcagg ctgtagcctt' taccttttcc tacccccagc ccatctcttt 39960 
gtcttagcat tgagcctgtg accactggtg acctatttca gcgtaacagg ttcccagggt 40020 
agcagggatg gttgatggac gggagagctg acaggatgcc aggcagaggg cactgtgagg 40080 
ccactggcag ctaaaggcca ccattagaca agttgagcac tggccacact gtgcctgagt 40140 
catctgggtt ggccatgggt ggcctgggat ggggcagcct gtgggagctt tatactgctc 40200 
ttggccacag gtggaggatg caattagcca gacgagccag acacttcagc tgttgattga 40260 
acatgacccc gtctcccaac gcctggacca gctgcggctg gaggcccggc tcgctcctca 40320 
catgcagaat gccggacacc cccacgccct cagcaccctg gacaccaagg agaatctgga 40380 
ggggacactg aggagacgtt ccctaaggtg ccacctccca ccctggctct gttctgtcct 40440 
atgtctgtct ctcggatgaa gctgagctgg ctttcagaag cctgcagagt taggaaagga . 40500 
accagctggc cagggacaga ctatgaggat tgtgctgacc cagctgcccc tgtggggatc 40560 
acagtttaca gccagagcct gtgcggaccc agctgtctgc caggtttcct tagaaacctg 40620 
agagtcagtc tctgtccact gaactcctaa gctggacagg aggcagtgat gctaaaccct 40680 
gaagggcaac atggcctatg gagaaagcat ggagctcaga gcctggagta cgggcacaga 40740 
taggattgaa taaattgtgt agaaagactt tgaaaacaat aaagcaaaag atgaatgaac 40800 
gtttttttta gacttgaggg accaacaacc cccaaacccc agattctgcc aggtccatgg 40860 
ggaaggagaa gttgccttga gtggaagccc caagtaggga gacttacaga aaagaagtca 40920 
agagcactgg ctcccaggca gaaatactga taccctactg gggcttcagg ctgagctcct 40980 
cccttcacaa atcacttcat ctctctgagc ctgtttctgc atctgtgaca taagatggta 41040 
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agataaaggt ggctgtctca ccaattatgt aaggattaaa tgtggaaaag gacataaagt 41100 

. tgtatagtgc tgccataggg acagtgttca gtaaacgtga cacattctta gtatcactaa 41160 

gaatcaggtt cttggccagg caccgtggct catgcctgta atcccaacac tctgggaggc 41220 

ctaggtcgga ggatggcttg aacacaggag tttgagacca gcctgagcaa catagtgaga 41280 

cactgtctct acaaaaaaaa aataataata ataattgttt ttaattagat gggcagggca 41340 

ctgtggctca cacctgtaat cccagcactt tgggaggcca aggccggagg attgcttgag 41400 

gccaggagtt caggagcagc ctgggccaca ttcctgtctc tacaaagaat aaaaaagtta 41460 

actgggcatg gtggcacatg cctgtaatcc cagctactca agaggctgag gaggaggatt 41520 

gcctgagccc aggagttcaa gactgcagtg agccttgatc acaccactgt actacagctt 41580 

gggcaacaga gtgagacctt gtctccaaaa aaaaaagttt gttttttttt atccactctc 41640 

ctcaccaaac aaactgagta agttagagcc ctctcagctg gcatgtgttg gaaacagtgc 41700 

cctctcatta aagtgctgcc ctcactccca ttgcctcttg gccttggtca gtatgatgaa 41760 

attagtggga ggcagggcaa cagagggcag ggaagagcta gaaatccatg gcctggaaaa 41820 

gggaagattt gggagtggcc aggtatctgt agagccacca tgcagaggag gggggcagct 41880 

agccttgtgt gctctggtgg gcatggtcag caggaggcag agcaaaagga caagggtaag 41940 

taaacctgta ggtcgggaca agccaagagc catccagcgt cagtcctctc tgggtagccc 42000 

aagtaaagca ggagcatacc ccagagagaa agttcgcagg gctgttcacc tgcagtgctg 42060 

tggacttcaa ccttcttgtt ccttcttcag taagtgaaaa taacagtcat tgaccatgac . 42120 

tattatcgac cgcttttgaa aatgtaaaca tagtgacttt attgctgtaa aaatcatacg 42180 

tgtttateat cttaaaattc aggaaacatg gacaggtaca aagatgtgca aaatatcatc 42240 

caaaatccca tttgctggcc aggcacggtg gctcacgcct gtaatcccag cacattggga 42300 

ggccgaggcg ggcaaatcac ttgaggtcag gagtttgaga ccagcctggc caacatggtg 42360 

aaaccctatc tctactaaaa atacaataat taggctgggc gcagtggctc acgcctataa 42420 

tcccagcact ttgggaggcc gaggtgggcg aatcacaagg tcaggagttt gagactagcc 42480 

tggccaatat ggtgaaaccc catctctact oaaaatacaa aaattagggc cgggtgtggt 42540 

ggctcacgcc tgtaatccca gcacttaggg aggccgagac agatggatcg cgagatcagg 42600 

agttcgagac caacctagcc aacatggtga aaccccatct, ctactaaaaa aatacaaaaa 42660 

ttattcggtt gtggtggcac acgcctgtaa tcccagctac ttgggaggct gaggcaggag 42720 

aatctcttga acctgggagg cagaggttgc agtgagtgga gatcccgccg ttgcactcca 42780 

gcctgggcga cagagtgaga ctccatcaaa aaaaaaaaaa aaaaaaaaaa aaattagccg 42840 , 

ggcgtggtgg cgtgcaccta tactcccagc iacttgggag gctgaggcag gagaatcgct 42900 

tgaacctgga aggcggaggt cgcagtgagc cgagatcgtg ccattgcact tcagcctggg 42960 

cgacagagcg agactctgtc tcaaaaataa taataataac aataactagc cgggcctggt 43020 

ggcacatgcc tgtagtccca gttactcagg aggcggaggc atgagactca ggtgaactag 43080 

ggagacagag gttgcagtga gccaagatca caccactgca ctceagcctg gttgacagag 43140 

cgagactctg tctcaaaaaa aaaaaaatcc catttgctca ttttttggat actagtataa 43200 

ctatcactct aaaccagtta gtacttaaat caagcagata tgggagatgg tgaattacca 43260 

tctacagtgt tgtcatatat gtcacatact gagcattatc agctagtaga atctagttaa 43320 

ttgttctatg tgtgatgtat gcagagttcc cattttgaat gtgtttttac tatgcttaaa 43380 
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taaatgactg atgtcagcaa ccccaaaatg atacatctga tgtaagagcc cctgttcccc 43440 
aataataaca tctaaactat agacattgga atgaacaggt gcccctaagt ttcctccctc 43500 
cagggtttct tggccggtct ctgaggacta cacatcccta ctcccgtctt tcctcatctt 43560 
caggcgcagt aacagtatct ccaagtcccc tggccccagc tccccaaagg agcccctgct 43620 
gttcagccgt gacatcagcc gctcagaatc ccttcgttgt tccagcagct attcacagca 43680 
gatcttccgg ccctgtgacc taatccatgg ggaggtcctg gggaagggct tctttgggca 43740 
ggctatcaag gtgagcgcag gcaacaattg ctttgctctt ctgcccccag tccctctgtc 43800 
actgtctttc ggggatttct catcacttgg ccccacccca caeca tgcag gatgecagge 43860 
ctccttcctg gctttgggtg ttggtgtgag aggtatcctt cacccccacc caggccacct 43920 
aaggtcaatg ttgctgttac agtgagcttg tggacctgga gatccaggtt gggttgagct 43980 
gtgcctgtgg ccctcctgcc tccagtcagt gggtgtttgt taggt-gectg cagacctcag 44040 
tacegggcat gctacaagga gcacacaggg gaatggctcc tgcctccctg gtgaacagtc 44100 
tcagggacta acctctctct ttctctcctc ctcctcctct tetgetgaga actgggaggg 44160 
ggggtcaggt aagacgtgtg tctcagct-tg ggggcagcag ggctggagag ctcacccccg 44220 
atccacccag ctccctggtg catgtctttg gcactgacct tcctgccccc agacttctgt 44280 
tcactcagga gactcacttc tatgecaaat gaccagagcc cctgcttggc ttggcagcat 44340 
cccctcctgc cttcttcccc acttcccttt tctgggttct tgcctgtcct ctgtgcatgc 44400 
ccagetctcc aggaaagagg gtttgettec gtgtgagtcc catgttgctc cacgctgcat 44460 
cttccacaca tgaactctgt cattctgacc eggctcagtg tgccctccaa gggatgggat 44520 
ggccagctgc atagattttc tcaaacagtt ctccagaact tcctctggtc tcagcaccat 44580 
taacagtcac cctccctgta ggtgacacac aaagecaegg gcaaagtgat ggtcatgaaa 44640 
gagttaattc gatgtgatga ggagacccag aaaacttttc tgactgaggt aagaagatgg 44700 
agggggeccg ggaggttggt gtcaccattg gaagagagaa gaccttacaa ataatggctt 44760 
caagagaaaa tacagtttgg aattactgtc ttaaagacta agcagaaaag agecctagag 44820 
gaatetccca ctccctctaa attacagegt aattatttgt tcaatgaaca cttactaaaa 44880 
gcaacacaaa cagggtacaa gggatgcagt aacaaaagat acagggttca gaagagctct 44940 
caggttatga ggatgatgga catgaaaaca ctccaattta gtacaactca atgttataat 45000 
cctcacctga acgccctgcb aagggagect ggaggggagc tccctgagca ctcacactcc 45060 
ttgggcattt acagttttca ctacccctcc caagttactt catggagtaa cttaagttgg 45120 
ggacacctgt ggtctgggta ttgccctcca agecaefctgg ccactcccac cccagttctc 45180 
ecaatgeagt tccaagggta aggectatga agccatctcc atctatatgg tggtggtctt 45240 
ccctcatcct gatcttagtg ccctgtcata tcacaagata ggaggtagga gatacaggtg 45300 
gtaacacttg tcaagctgat tccttggagg gaagaggtaa ggaagacagt gagaagttaa 45360 
.ccaccagctt tccttggctt cccccacccc caggtgaaag tgatgcgcag cctggaccac 45420 
cccaatgtgc tcaagttcat tggtgtgctg tacaoggata agaagctgaa cctgctgaca 4S480 
gagtacattg aggggggcac actgaaggac tttctgegea gtatggtgag cacaccaccc 45540 
catagtctcc aggagecttg gtgggttgtc agacacctat gctatcacta ccctaggagc 45600 
ttaaagggca gaggggcect gctttgcctc caaaggacca tgctgggtgg gactgagcat 45660 
acatagggag gcttcactgg gagaccacat tgacccatgg ggcctggacc acgagtggga 45720 
cagggctcaa cagcctctga aaatcattcc ccattctgca ggatccgttc ccctggcagc 45780 
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agaaggtcag gtttgccaaa ggaatcgcct ccggaatggt gagtcccacc aacaaacctg 45840 

. ccagcagggc gagagtaggg agaggtgtga gaattgtggg cttcactgga aggtagagac 45900 

cccttcctat gcaacttgtg tgggctgggt cagcagctat tcattgagtt tgtctgtgtc 45960 

actgaaactg accccagcca actgttctca gttcacagcc ctgttttcaa agaattacac 46020 

otctctaaag gcaaacaggg cacggacaag gcaaactgga gaggcaaact gtagcctgag 46080 

otggcctggg cttgccatca caggtattca ggtgctgagg gcccttagac caactagagc 46140 

occtcactgc ctaggaaatc aatgaagggg aaatgagttc tagcggagcc ctgaaggatc 46200 

agaattggat oaagttctta ttggcagaga ggcaccagga ttgaagtgac aggagcaaag 46260 

acctgggagg aaagaggaga aaatcatcta tttcacctgg aaacaaatga ttccaagcat 46320 

agaaataata acagctgaca agtactgagt gccctctata tgctaggcac tgggctgagg 46380 

gattaacatg catgtgcatg tttattcctc atgacaacct tggtttccag ataagctgga 46440 

ctggaaaggg acagagctgg gatcctgggc taatcagtct ggtcgccaag cctgagactt 46500 

tagccactgc ccttcacatg ggggtccatg aaaatagtag tagtctggaa cagtttgggg 46560 

gtacatcaag gtcgctgtgt tttaagctat ggagtctgga ctataggaga caaatgtaaa 46620 

agagtttttt ggttgactgg ctttttggtt tttttgtttg tttgtttgtt tgtttgtttg 46680 

tttgtttgtt ttttcctgtt tctggggctt gaatcaggaa ggaggttttt ttgttgttgt 46740 

tgttttgaga aaggatattg ctctgttgcc cagactggag tgcagtggca cgatcatggc 46800 

tcactacagc ttcgacctcc tgggctcaag caatcctcct gccttagcct cccaagtagc 46860 

tggactacag gtgtgtacca ccacacctaa ttttttgaat ttttttttct tttttttttt 46920 

tttttttttt ggtagagaca ggttctcact ttgttgccca ggcctgaatc tcaaactcct 46980 

gggctcaagc attcctcctg cctcgccctc ccaaagtgtt gggattacag ttgtgagcca 47040 

ccatgcccgg caggaaaaga tttttaagca agaaagctta agagctgtgg tttttccaaa 47100 

atgagtctgg gctggcacag tggctcatgc ctgtaatccc agcacttttt tgggaggccg 47160 

aggtgagtgg atcacttgag gtcaggagtt tgagaccagc ctggccaact ggtgaaaccc 47220 

ctgtttctac taaagaaaaa aatgcaaaaa ttagctgggc gtggtggtgc acgcctgtag 47280 

tcccagctac tcaggaggcc gaggcaggag aatagcttga acctgggagg cagaagttgc 47340 

agtgagccaa gatcacacca ctgcattcca gcctgggtga cagagtgaga cttcatctca 47400 

aaaaaaaaaa aaaagagaga ctgatatggt tagtacattg gggtggaatg cggagggtcc 47460 

agggaatgga gccctgcata gggggctaat gaaacatttc agatttctga attaaggtag 47520 

. tggctgtggg gacaggagcc tgggaggcag ggtggagtca gaatggagag actggttggc 47580 

aatgagggaa caggaggagg aggaggagga gttacgagtg gcttgaggtg tcacttacca 47640 

gacatttggg ggatggggga tagccgtgat tgttgagcaa ctggtttggg aagagctagc 47700 

attgatccct gctgttctgt gctagcagaa cctatcagca tcttctgggc aggaaactgg 47760 

ctccatgaga ctggcttagg gagaggctgc tagtcaccta atctgcagag aaggggcagc 47820 

tggagctgtg ggacagaaga ggcatccatg tagctggtgg gggtgtctca gcttgtgaag 47880 

aggagatggc tttgagcagg gctgacactg aaaaggctgg aagaaaaaaa cagacacaca 47940 

agagtctcag gatcaggtag cataggaaag ttgtggacag tctttgagga gcactccctc 48000 

aggcaggcag gcaggcaggt catgagctat agcgattcag gaagagctcc ctgggtgtgt 48060 

gagcagctcc aggagcctaa gggatgaaag tagtattgca gggggctgga gagcaaggag 48120 
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tggctccttc tacatttgca agggaaggag aaaggaagtt gctcctgaga gtggtaagag 49180 
tcagtggtgg aggcctggag aggagacata acaaacaaat ttgttgacaa acattttggt 48240 
aggaaggggg agagcttaaa gtttagacag tggggaaggt ggagtcttag aggaggtgaa 48300 
tgtctgaaag acagagctag ctggagcaag aagtcacttc tctgttgcag gcaggaagga 48360 
tccaaagtgg ctcaagccag agattgggag agtggggagg agggagcagc ctggatctaa 48420 
gtaaaatggg tagaggtgga gggggtgctg caacggccag ggttttctga agttggggac . 48480 
attaggagag agctgtgagg gctttggcca gccactgtgc tagtgattgg tgaaccaaag 48540 
gatgggcagg agatggcagc agggaagcag aggaagtcca ggcttcctgt tggtattggg . 48600 
acaagggaga ggccatagga ggccctggcc ctgttgtcca ggttgggttc tgaagctggg 48660 
tgggcatggc ctggtaggag agcatctatg gcgcccaatt ccagattcag ggtctagttg 48720 
atttgctggc cctgtagcct cagctcatgc ttctgttcca ggcctatttg cactctatgt 48780 
gcatcatcca ccgggatctg aactcgcaca actgcctcat caagttggta tgtcccactg 48840 
ctctgggcct ggcctccagg gtcctatcct tcctggcttc cttgtcacaa aggaggctga 48900 
cttgtcccct ctggctagag ggcagaggtg ttgcctagga gctcctatct ttcccttcct 48960 
gcttcttcca atgcccttct ctgtcctctg ggagctccga gacacacaca gacataattt 49020 
caccttctct cattagcaac ctttgaaata atttgattag aagggacttc agaagtttgt 49080 
tgactatatg tagaaaaccc tgtcatttta cctgcttttg ccccatagta gtcttgtaaa 49140 
acagttcatt gctgacccca ttttacagtg gtggcacctg aagcctcagc ctgaggccac 49200 
cgagctagta aatttacagg gaccagtttg agaccagcat tcctcccact gcccctcagc 49260 
tgtggtggtt acaatgttgt ttgtcttact gacttgctat ctggcttcct gggtgtctac 49320 
cggctggccc tggctctgcc ctctagaccc acaccacgca atcttcattc ctttcccaca 49380 
tgactgccct gtagctattc aaagagcttg tctcccccaa gtctccccat ctactgcctc 49440 
caccttgcct ttttctgtct tatcctggtt ctagccactg cctgaaatca ttttaggaat 49500 
aagacaggac agggaaaaac aaaagcaacc ccctgtccca cctctgagtt ccactctcca 49560 
agtccctgag cctcacctcc agggctccag tggctctgcc atgaacccac tgtgggctgg 49620 
gagtctgctg tgcacagata ccagaccctc agaaacacaa atgccaagtg tgtctgtttt 49680 
tttgttttgt tttgttttgt tttttagatg gagtctcatt ctgtttccca ggctggagtg 49740 
cagtggtgca atcttggctt actgcagcct ctacctcccg ggttctagtg attgttctgc 49800 
ttcagcctcc cagtagctag gactacaggc gtgtgecacc acgcccagct aatttttttt 49860 
tttttttttt tgtattttta gtagagacag ggttttgcca tgttggccag gctggtcttg 49920 
aactcctgac ctcaggt'gat tcacccgcct tggcctccca aagttctggg attacaggtg ' 49980 
gaagccaccg tgcctggcct gagtgtgtct atttgataga gctttctgct ctgattctcc 50040 
cttgctatac accttttctc cccttctcag tggcttctct tgcctatgct tcctccccag 50100 
ggccaggttt gagaacatcc ccatgaagtc ctgacctgtc ttttatccta ccaggacaag 50160 
actgtggtgg tggcagactt tgggctgtca cggctcatag tggaagagag gaaaagggcc 50220 
cccatggaga aggccaccac caagaaacgc accttgcgca agaacgaccg caagaagcgc 50280 
tacacggtgg tgggaaaccc ctactggatg gcccctgaga tgctgaacgg tgagtcctga 50340 
agccctggag gggacacccg cagagggagg acagatgctg cccttgcatc agagccctgg 50400 
gaattccagg ggaggcctgt gaagcgtagg accggatacc cagagctgag gatatttttc 50460 
ccttgccagg tggggcctca cgatttagct cctgagctca 99999Ctggg aactgatcag 50520 
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tgtcccatca 


taaaaaataa 


ggtgagttct 


gactgtggca 


tttgtgcctd 


agggAtcgct 


50580 




gctAttgtcc 


cagctttagc 


cttctctctc 


catggtgaga 


actgaagtgt 


50640 


ggtgccctct 


acrtqqataat 


gctcaaacca 


accagagatg 


ctggttggga 


ttcttgaaat 


50700 


f m net et+ +■ ci 


aggcctcaga 


aatggtctga 


atacaatcca 


ttttggagtc 


tgaggcccag 


50760 




tgaattgcct 


aggagcatac 


agctgcctaa 


tggcagaggc 


tagatgaacc 


50820 


+■ a <t+ ^ ct erf - 


tcttttcc AC 


tttaacgtgc 


agtttcatcc 


taggcagtgt 


tatgttataa 


50880 


gggctctcca 


aggcagttcA 


cctAcggctg 


aggaaggact 


attttcaggt 


ggtgtctgcg 


50940 






ccctacagaa 


cctgttctag 


ccctagttct 


tagctgtggc 


51000 


t t ag At t gAC 


c ct agacc ca 


qtqcaqagca 


ggtaagggat 


gtaaacttaa 


cagtgtgctc 


51060 


tcctgtgttc 


ccc AAggaAA 


gagctAtgat 


gagacggtgg 


atatcttctc 


ctttgggatc 


51120 


et "fc t a to 

U ^ Vk* Wl>* wVJ 


a ggtgagc tc 


tggcaccaag 


gccatgcccg 


aggcagcagg 


cctagcagct 


51180 


c tgc ct tc cc 


tcggaactgg 


ggcatctcct 


cctagggatg 


actagcttga 


ctaaaatcaa 


51240 


f A +■ *T (Iff 'fr +" A 


gggttttatg 


gtttAtaacg 


catctgcaca 


tctttgccac 


gttcgtgttt 


51300 


C Att g g t C t t 


a a 9 agaag ga 


ctqqcaqqqt 


ttttttgttt 


tagatggagc 


ctcacttcgt 


51360 


t 9c c c a 55 ct 




ggcacaatct 


gggctcactg 


caacctctgc 


cttctgggtt 


51420 


caagtg at tc 


tcctgcctca 


gcctcccaag 


tagctgggac 


taccggcaca 


caccaccatg 


51480 


CCC99C to tit 


ttttgtattt 


ttagtagaga 


cagggtttca 


ccatgttggc 


caggctggtc 


51540 


1 t ga AC t C C<J 


gacctcaggt 


gatccgcctg 


cctcagcctc 


taaaagtgct 


ggaattaata 


51600 


y 3^3 ^ 3 y ** 


acctcgcccg 


gccaggtttt 


tttttttttt 


tttttagttg 


aggaaactga 


51660 


9 gcttg g a ag 


a erotica at qq 


cttgcacatg 


gtcgataagg 


ggcagatgag 


actcagaatt 


51720 


ccagaaggaa 


gggcaagagA 


ctgttcatgt 


ggctgtctag 


ctagctcttg 


ggccaaatgt 


51780 


Agcccttctc 


agttcccttc 


aagtagaagt 


agccactcta 


ggaagtgtca 


gccctgtgcc 


51840 


sggt Accscg 


tggacagagt 


gag gA at c tt 


ggaaagattc 


ctacctttag 


gagtttagtc 


51900 


AaotaACAOC 


AtAtctcagc 


gactcAAaca 


cacacacatt 


caaagccttc 


tgtaattcct 


51960 


acaaagttgt 


qaqqqqtaga 

3 3 3 3 3 "3 — 


ggagaggaga 


gacaagggat 


ggttaggata 


atgaaggaat 


52020 


gttttgtttt 


tgtttttgtt 


tttgagatgg 


agtttcactc 


tgtcacccag 


gctggagtgc 


52080 


aaaaotacaa 


tcttggctca 


ctgcagcctc 


cgcctcccag 


gttcaagcaa 


tcctcctgcc 


52140 


tCAgcctccc 


aaataactqq 
**** 3 3 33 


gactACAggt 


gtgcgccacc 


acgcctggct 


aatttttgta 


52200 


ttttcagtag 


aqacaaoott 

**3 J 33 


tcgccatatt 


ggccaggctg 


gtctcaaatg 


cctgacctca 


52260 


9 gt 9 A t AC <1C 


ccgcttcagc. 


ctcccaaagt 


get gag att a 


caggcatgag 


ctaccgtgcc 


52320 


tggccatgaa 


ooaaaatttq 


ttttaaaaaa 


ttgttttctt 


taatattaat 


tgaacacctc 


52380 


tgttcagagc 


33^ tyy 


taccaoaaqa 


tttcagacat 


gaatcagatc 


cagcacctca 


52440 


tAgagccttA 


atctggcaco 


cacacacagc 


cacaaggaga 


cacagacaag 


gcagggtagg 


52500 


atgagtggaa 


gctaggagca 


gatgctgatt 


tggaacactt 


ggcttctgca 


gtgaagcccc 


52560 


ttcttagtcc 


tcttCAgtAa 


cccagctctc 


agtggataca 


ggictggatt 


agtaagattt 


52620 


ggAgagatgA 


ttggggAttg 


gggagagctc 


tctaacctat 


tttaccacct 


cctcttctgc 


52680 


cattcttcct 


gtccacatcc 


ccagcatccc 


tttcccttgc 


caagtatctg 


tggcctctgt 


52740 


agtcctttgt 


aaac&gctgt 


cttcttaccc 


tacagatcat 


tgggcaggtg 


tatgc agate 


52800 


ctgaetgcct 


tccccgaaca 


ctggactttg 


gcctcaacgt 


gaagcttttc 


tgggagaagt 


52860 
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ttgttcccac agattgtccc ccggccttct tcccgctggc cgccatctgc tgcagactgg 52920 
agcctgagag caggttggta tcctgccttt ttctcccagc tcacagggtc ctgggacgtt 52980 
tgcctctgtc taaggccacc cctgagccct ctgcaagcac aggggtgaga gaagccttga .53040 . 
ggtcaagaat gtggctgtca acccctgagc catctgacaa cacatatgta caggttggag 53100 
aagagagagg taaagacata gcagcaagta atctggatag gacacagaaa cacagccatt 53160 
aaaagaaagt ttaaaagaag gaaattcacc caaaccattt gaatacagta agtgtattca 53220 
tctttcgata ttcccctgtc catatctaca catatacttt tttttatagt aaatagttct 53280 
gtattttgcc ctgcatttcc cttgtgttta ctatccagtc ttcctgttta tcatttttgt 53340 
cgacaacatg aaattctatt gagagactgt ctgaacatat tgtaatgtag atgttcaggt 53400 
ttttccagtt tctctttaca ataggtattt aactacagtg agcagtttta tgcatttagc 53460 
taatttctcc tttgaggaag tattttcaaa attaccttta ttcttctcag gtaataattt 53520 
cattattacc aaagttaccc taggtctttt caagtgtgtg gttaaaaaac gagaatctgg 53580 
ctgggcgcga tggctcacac ctgtaatccc agcactttgg gaggctgagg ctggtggatc 53640 
acctgaggtc tggagttcga gaccagcctg gccaacatgg tgaaacccca tctctactaa 53700 
aaatacaaaa cttagccagg catggtggca ggtgcctgta accccagcta cttgggaggc 53760 
tgaggcagga gaattgcttg aacccagggg cggaggttgc agtgagccga tatcacgcca 53820 
ttgcactcca gcctcggcaa caagagtgaa actctgtctc aaaaatgggg ttcttttcct 53880 
gccatcaaaa atcatgtttc ttttaaaaac aagttcaaac attaccaaag tttatagcac 53940 
aggaaatacg tcttctgtaa tctcccttaa ccaatatatc cctcaacatt ctcctcaccc 54000 
ccaactccac cctcccagga taaccagttg ggacataatc tttatttaaa aatggtttcc 54060 
ggatagagaa agcgcttcgg cggcggcagc cccggcggcg gccgcagggg acaaagggcg 54120 
ggcggatcgg cggggagggg gcggggcgcg accaggccag gcccgggggc tccgcatgct 54180 
gcagctgcct ctcgggcgcc cccgccgccg ccctcgccgc ggagccggcg agctaacctg 54240 
agccagccgg cgggcgtcac ggaggcggcg gcacaaggag gggccccacg cgcgcacgtg 54300 
gccccggagg ccgccgtggc ggacagcggc accgcggggg gcgcggcgtt ggcggccccg 54360 
gccccggccc ccaggccagg cagtggcggc caaggaccac gcatctactt tcagagcccc 54420 
ccccggggcc gcaggagagg gcccgggctg ggcggatgat gagggcccag tgaggcgcca 54480 
agggaaggtc accatcaagt atgaccccaa ggagctacgg aagcacctca acctagagga 54540 
gtggatcctg gagcagctca cgcgcctcta cgactgccag gaagaggaga tctcagaact 54600 
agagattgac gtggatgagc tcctggacat ggagagtgac gatgcctggg cttccagggt .54660 
caaggagctg ctggttgact . gttacaaacc cacagaggcc ttcatctctg gcctgctgga 54720 . 
caagatccgg gccatgcaga agctgagcac accccagaag aagtgagggt ccccgaccca 54780 
ggcgaacggt ggctcccata ggacaatcgc taccccccga cctcgtagca acagcaatac 54840 
cgggggaccc tgcggccagg cctggttcca tgagcaggge tcctcgtgcc cctggcccag 54900 
gggtctcttc ccctgccccc tcagttttcc acttttggat ttttttattg ttattaaact 54960 
gatgggactt tgtgttttta tattgactct gcggcacggg ccctttaata aagcgaggta 55020 
gggtacgcct ttggtgcagc tcaaaaaaaa aaaaaaaaat gatttccagc ggtccacatt 55080 
agagttgaaa ttttctggtg ggagaatcta taccttgttc ctttataggc caaggaccgc 55140 
agtccttcag taacaccagt gtaaaagctt gaggagaaat tgtgaagcta cacagtattt 55200 
gttttctaat acctcttgtc attctaaata tctttaattt attaaaaaat atatatatac 55260 
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agtattgaat 


g cc tac tg tg 


tgetaggtae 


agttctaaac 


acttgggtta 


cagcagegaa 


55320 


caaaataaag 


gtgcttaccc 


tcatagaaca 


, tagattctag 


catggtatct 


actgtatcat 


55380 


acagtagata 1 


caataagtaa 


actatattga 


atattagaat 


gtggcagatg 


ctatggaaaa 


. 55440 




caagtaaaga 


cgattgttca 


gggtaccagt 


tgcaatttta 


aatatggtcg 


55500 


tcagagcagg 


cctcactgag 


gtgacatgac 


atttaagcat 


aaacatggag 


gaggaggagt 


55560 


aagcctgagc 


tgtcttaggc 


ttceggggea 


gccaagccat 


ttccgtggca 


etaggagect 


55620 


o chat 1 1 c c a 


attccacctt 


tgataactgc 


attttctcta 


agatatggga 


gggaagtttt 


55680 


tctcctattg 


tttttaagta 


ttaactccag 


ctagtccagc 


cttgttatag 


tgttacctaa 


55740 


tctttatagc 


aaatatatga 


ggtaccggta 


acattatgee 


catttctcac 


agaggcacta 


55800 




gagtttgect 


gaegttatae 


aaccaggaag 


tagctgagee 


tagatccctt 


55860 


ccacccaccc 


catggccctg 


ctcatgttcc 


acctgcctct 


aatttacctc 


ttttccttct 


55920 




tctcgaaatt 


qqaqqactcc 

3 3 33 


tttgaggece 


tctccctgta 


cctgggggag 


55980 


ctgggc atcc 


cgctgcctgc 


agagctggag 


gagttggacc 


ocactgtgag 


catgeagtae 


56040 


ggcc tg accc 


gggactCaCC 


-tccctagccc 


tggcccagcc 


ccctgcaggg 


gggtgttcta 


56100 


c age c a gc at 


t gc ccc t c tg 


tgccccattc 


ctgctgtgag 


cagggccgtc 


cgggcttcct 


56160 


gtggattggc 


g ga atgtt t a 


gaagcagaac 


aagccattcc 


tattacctcc 


ccaggaggca 


56220 


a 3 **3 3 3 3 w a 


acaccaaqqa 

3 3JS 


aatgtatctc 


cacaggttct 


ggggcctagt 


tactgtctg-t 


56280 


aaatcc aata 


ettgectgaa 


agctgtgaag 


aagaaaaaaa 


cccctggcct 


ttgggccagg 


56340 


aggaatctgt 


tactcgaatc 


cacccaggaa 


ctccctggca 


gtggattgtg 


ggaggctctt 


. 56400 


gcttacacta 


atcagcgtga 


cctggacctg 


ctgggcagga 


tcccagggtg 


aacctgcctg 


56460 


t gaa c t c t ga 


agtcactagt 


ccagctgggt 


gcaggaggac 


ttcaagtgtg 


tggacgaaag 


56520 


aaagactgat 


ggctcaaagg 


gtgtgaaaaa 


gtcagtgatg 


ctcccccttt 


ctactccaga 


56580 


tcctgtcctt 


cctggagcaa 


ggttgaggga 


gtaggttttg 


aagagtccct 


taatatgtgg 


56640 


t aoa ac aqqc 


caqqaqttaq 

" 35 3 3 


agaaagggct 


ggcttctgtt 


tacctgctca 


ctggctctag 


56700 


ccagcccagg 


gaccacatca 


atgtgagagg 


aagcctccac 


ctcatgtttt 


caaacttaat 


56760 


actggagact 


qqctqaqaac 

53* 3 3 


ttacggacaa 


catcctttct 


gtctgaaaca 


aacagtcaca 


56820 


agcacaggaa 


qaqqctqqqg 

3 33 3333 


gactagaaag 


aggccctgcc 


ctctagaaag 


ctcagatctt 


56880 


ggcttctgtt 


actcatactc 


gggtgggctc 


cttagtcaga 


tgectaaaac 


attttgecta 


56940 


aagctcgatg 


ggttctggag 

V J 3 3 3 1 


gacagtgtgg 


cttgtcacag 


gectagagtc 


tgagggaggg 


57000 


qaotqqqaqt 

J"? w 333**3 * 


ctcagcaatc 


tcttggtctt 


ggcttcatgg 


caaccactgc 


tcacccttca 


57060 


acatgectgg 


tttaggcagc 


agcttgggct 


gggaagaggt 


ggtggcagag 


t'etcaaaget 


57120 




gagagatagc 


tccctgagct 


gggccatctg 


acttctacct 


cccatgtttg 


57180 


ctctcccaac 


tcattagctc 


ctgggcagca 


tcctcctgag 


ccacatgtgc 


aggtactgga 


57240 


aaacctccat 


cttggctccc 


agagctctag 


gaactcttca 


tcacaaetag 


atttgectet 


57300 


tctaagtgtc 


tatgagcttg 


caccatattt 


aataaattgg 


gaatgggttt 


ggggtattaa 


57360 


tgcaatgtgt 


ggtggttgta 


ttggagcagg 


gggaattgat 


aaaggagagt 


ggitgctgtt 


57420 


aatattatct 


tatctattgg 


gtggtatgtg 


aaatattgta 


catagacctg 


atgagttgtg 


57480 


ggaccagatg 


tcatctctgg 


tcagagttta 


ettgetatat 


agactgtact 


tatgtgtgaa 


57540 


gtttgeaage 


ttgetttagg 


gctgagccct 


ggactcccag 


cagcagcaca 


gttcagcatt 


57600 
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gtgtggctgg ttgtttcctg gctgtcccca gcaagtgtag gagtggtggg cctgaactgg 57660 
gccattg&tc agactaaata eattaagcag ttaacataac tggcaatatg gagagtgaaa 57720 
acatgattgg ctcagggaca taaatgtaga gggtctgcta gccaccttct ggcctagccc 57780 
acacaaactc cccatagcag agagttttca tgcacccaag tctaaaaccc tcaagcagac 57840 
acccatctgc tctagagaat atgtacatcc cacctgaggc agccccttcc ttgcagcagg 57900 
tgtgactgac tatgaccttt tcctggcctg gctctcacat gccagctgag tcattcctta 57960 
ggagccctac cctttcatcc tctctatatg aatacttcca tagcctgggt atcctggctt 58020 
gctttcctca gtgctgggtg ccacctttgc aatgggaaga aatgaatgca agtcacccca 58080 
ccccttgtgt ttccttacaa gtgcttgaga ggagaagacc agtttcttct tgcttctgca 58140 
tgtgggggat gtcgtagaag agtgaccatt gggaaggaca atgctatctg gttagtgggg 58200 
ccttgggcac aatataaatc tgtaaaccca aaggtgtttt ctcccaggca ctctcaaagc 58260 
ttgaagaatc caacttaagg acagaatatg gttcccgaaa aaaactgatg atctggagta 58320 
cgcattgctg gcagaaccac egagcaatgg ctgggcatgg gcagaggtca tctgggtgtt 58380 
cctgaggctg ataacctgtg gctgaaatcc cttgctaaaa gtccaggaga cactcctgtt 58440 
ggtatctttt cttctggagt catagtagtc accttgcagg gaacttcctc agcccagggc 58500 
tgctgcaggc agcccagtga cccttcctcc tctgcagtta ttcccccttt ggctgctgca 58560 
gcaccacccc cgtcacccac cacccaaccc ctgccgcact ccagccttta acaagggctg 58620 
tctagatatt cattttaact acctccacct tggaaacaat tgctgaaggg gagaggattt 58680 
gcaatgacca accaccttgt tgggacgcct gcacacctgt ctttcctgct tcaacctgaa 56740 
agattcctga tgatgataat ctggacacag aagccgggca cggtggctct agcctgtaat 58800 
ctcagcactt tgggaggcct cagcaggtgg atcacctgag atcaagagtt t gaga a cage 58860 
ctgaccaaca tggtgaaacc ccgtctctac taaaaataca aaaattagee aggtgtggtg 58920 
gcacatacct gtaatcccag ctactctgga ggctgaggca ggagaatege ttgaacccac 58980 
aaggcagagg ttgcagtgag gcgagatcat gccattgcac tccagcctgt gcaacaagag 59040 
ecaaactcca tctcaaaaaa aaaaa 59065 



<210> £EQ ID NO 4 
<211> LENGTH t 265 
<212> TYPE: PRT 
<213> ORGANISM: Human 

<400> SEQUENCE x 4 

Leu Thr Glu Val Lys Val Met Arg Ser Leu Asp His Pro. Asn Val Leu 
i ; 5. 10 15 ■ 

Lys Phe He Gly Val Leu Tyr Lys Asp Lys Lys Leu Asn Leu Leu Thr 
20 25 30 

Glu Tyr He Glu Gly Gly Thr Leu Lys Asp Phe Leu Arg Ser Met Asp 
35 40 45 

Pro Phe Pro Trp Gin Gin Lys Val Arg Phe Ala Lys Gly He Ala Ser 
50 55 60 

Gly Met Ala Tyr Leu His Ser Met Cys He He His Arg Asp Leu Asn 
65 70 75 80 

Ser His Asn Cys Leu He Lys Leu Asp Lys Thr Val Val Val Ala Asp 
85 90 95 

Phe Gly Leu Ser Arg Leu He Val Glu Glu Arg Lys Arg Ala Pro Met 
100 105 110 
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Glu Lys Ala Thr Thr Lya Lya Arg Thr Leu Arg Lya Asn Asp Arg Lya 
115 120 125 

Lye Arg Tyr Thr Val Val Gly Asn Pro Tyr Trp Met Ala Pro Glu Met 
130 135 . 140 . 

Leu Asn Gly Lys Ser Tyr Asp Glu Thr Val Asp lie Phe Ser Phe Gly 
145 150 155 160 

He Val Leu Cys Glu He lie Gly Gin Val Tyr Ala Asp Pro Asp Cys 
165 170 175 

Leu Pro Arg Thr Leu Asp Phe Gly Leu Asn Val Lye Leu Phe Trp Glu 
180 1S5 190 

Lys Phe Val Pro Thr Asp Cya Pro Pro Ala Phe Phe Pro Leu Ala Ala 
195 200 205 

He Cys Cys Arg Leu Glu Pro Glu Ser Arg Pro Ala Phe Ser Lys Leu 
210 215 220 

Glu Asp Ser Phe Glu Ala Leu Ser Leu Tyr Leu Gly Glu Leu Gly He 
225 230 235 240 

Pro Leu Pro Ala Glu Leu Glu Glu Leu Asp His Thr Vol Ser Met Gin 
245 250 255 

Tyr Gly Leu Thr Arg Asp Ser Pro Pro 
260 265 



That which is claimed is: 

1. An isolated nucleic acid molecule consisting of a 
nucleotide sequence selected from the group consisting of: 

(a) a nucleotide sequence that encodes an amino acid 
sequence shown in SEQ ID NO:2; 

(b) a nucleic acid molecule consisting of the nucleic acid 
sequence of SEQ ID NO:l; 

(c) a nucleic acid molecule consisting of the nucleic acid 
sequence of SEQ ID NO:3; and 

(d) a nucleotide sequence that is completely complemen- 
tary to a nucleotide sequence of (a)-(c). 

2. A nucleic acid vector comprising a nucleic acid mol- 
ecule of claim 1. 

3. A host cell containing the vector of claim 2. 

4. A process for producing a polypeptide comprising 
culturing the host cell of claim 3 under conditions sufficient 
for the production of said polypeptide, and recovering the 
peptide from the host cell culture. 



5. An isolated polynucleotide consisting of a nucleotide 
30 sequence set forth in SEQ ID NO:l. 

6. An isolated polynucleotide consisting of a nucleotide 
sequence set forth in SEQ ID NO:3. 

7. A vector according to claim 2, wherein said vector is 
35 selected from the group consisting of a plasmid, virus, and 

bacteriophage. 

8. A vector according to claim 2, wherein said isolated 
nucleic acid molecule is inserted into said vector in proper 
orientation and correct reading frame such that the protein of 

40 SEQ ID NO:2 may be expressed by a cell transformed with 
said vector. 

9. A vector according to claim 8, wherein said isolated 
nucleic acid molecule is operatively linked to a promoter 

45 sequence. 

* * * * * 
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HUMAN TRANSPORTER PROTEINS AND 
POLYNUCLEOTIDES ENCODING THE 
SAME 

The present application claims the benefit of U.S. Pro- 5 
visional Application No. 60/185,956 which was filed on Feb, 
29, 2000 and is herein incorporated by reference in its 
entirety. 



INTRODUCTION 



10 



The present invention relates to the discovery, 
identification, and characterization of novel human poly- 
nucleotides encoding proteins that share sequence similarity 
with mammalian transporter proteins. The invention encom- 15 
passes the described polynucleotides, host cell expression 
systems, the encoded proteins, fusion proteins, polypeptides 
and peptides, antibodies to the encoded proteins and 
peptides, and genetically engineered animals that either lack 
or over express the disclosed polynucleotides, antagonists 2Q 
and agonists of the proteins, and other compounds that 
modulate the expression or activity of the proteins encoded 
by the disclosed polynucleotides that can be used for 
diagnosis, drug screening, clinical trial monitoring, and 
treatment of diseases and disorders. 



BACKGROUND OF THE INVENTION 



SUMMARY OF THE INVENTION 



25 



Transporter proteins are integral membrane proteins that 
mediate or facilitate the passage of materials across the lipid 
bilayer. Given that the transport of materials across the 30 
membrane can play an important physiological role, trans- 
porter proteins are good drug targets. Additionally, one of 
the mechanisms of drug resistance involves diseased cells 
using cellular transporter systems to export chemotherapeu- 
tic agents from the cell. Such mechanisms are particularly 35 
relevant to cells manifesting resistance to a multiplicity of 
drugs. 



40 



The present invention relates to the discovery, 
identification, and characterization of nucleotides that 
encode novel human proteins, and the corresponding amino 
acid sequences of these proteins. The novel human proteins 
(NHPS) described for the first time herein share structural 4S 
similarity with mammalian ion transporters, calcium trans- 
porters (particularly calcium transporting ATPases), sulfate 
transporters, and zinc transporters. 

The novel human nucleic acid sequences described 
herein, encode alternative proteins/open reading frames 50 
(ORFS) of 1,177 and 374 amino acids in length (calcium-^ 
transporting ATPase, SEQ ID NOS: 2 and 4), 970 (sulfate' 
transporter, SEQ ID NO:7), and 507 (zinc transporter, SEQ 
ID NO: 10) amino acids in length. 

The invention also encompasses agonists and antagonists 55 
of the described NHPs, including small molecules, large 
molecules, mutant NHPs, or portions thereof, that compete 
with native NHP, peptides, and antibodies, as well as nucle- 
otide sequences that can be used to inhibit the expression of 
the described NHPs (e.g., antiscnse and ribozyme 60 
molecules, and gene or regulatory sequence replacement 
constructs) or to enhance the expression of the described 
NHP polynucleotides (e.g., expression constructs that place 
the described polynucleotide under the control of a strong 
promoter system), and transgenic animals that express a 65 
NHP transgene, or "knock-outs" (which can be conditional) 
that do not express a functional NHP. Knock-out mice can 



be produced in several ways, one of which involves the use 
of mouse embryonic stem cells ("ES cells") lines that 
contain gene trap mutations in a murine homolog of at least 
one of the described NHPs. When the unique NHP 
sequences described in SEQ ID NOS: 1-11 are "knocked- 
out" they provide a method of identifying phenotypic 
expression of the particular gene as well as a method of 
assigning function to previously unknown genes. 
Additionally, the unique NHP sequences described in SEQ 
ID NOS: 1-11 are useful for the identification of coding 
sequence and the mapping a unique gene to a particular 
chromosome. 

Further, the present invention also relates to processes for 
identifying compounds that modulate, i.e., act as agonists or 
antagonists, of NHP expression and/or NHP activity that 
utilize purified preparations of the described NHPs and/or 
NHP product, or cells expressing the same. Such compounds 
can be used as therapeutic agents for the treatment of any of 
a wide variety of symptoms associated with biological 
disorders or imbalances. 

DESCRIPTION OF THE SEQUENCE LISTING 
AND FIGURES 

The Sequence Listing provides the sequences of the 
described NHP ORFs that encode the described NHP amino 
acid sequences. SEQ ID NOS 5, 8, and 11 describe nucle- 
otides encoding NHP ORFs along with regions of flanking 
sequence. 

DETAILED DESCRIPTION OF THE 
INVENTION 

The NHPs described for the first time herein are novel 
proteins that may be expressed in, inter alia, human cell 
lines, fetal brain, pituitary, cerebellum, thymus, spleen, 
lymph node, bone marrow, trachea, kidney, fetal liver, liver, 
prostate, testis, thyroid, adrenal gland, salivary gland, 
stomach, small intestine, colon, adipose, rectum, 
pericardium, bone marrow, placenta, and gene trapped 
human cells. More particularly, the NHP that is similar to 
sulfate transporters (and the down-regulated in adenoma, or 
DRA, gene) is predominantly found in bone marrow and 
testis, and the zinc transporter-like NHP can be found 
expressed in the placenta. 

The present invention encompasses the nucleotides pre- 
sented in the Sequence Listing, host cells expressing such 
nucleotides, the expression products of such nucleotides, 
and: (a) nucleotides that encode mammalian homologs of 
the described polynucleotides, including the specifically 
described NHPs, and the NHP products; (b) nucleotides that 
encode one or more portions of the NHPs that correspond to 
functional domains, and the polypeptide products specified 
by such nucleotide sequences, including but not limited to 
the novel regions of any active domains); (c) isolated 
nucleotides that encode mutant versions, engineered or 
naturally occurring, of the described NHPs in which all or a 
part of at least one domain is deleted or altered, and the 
polypeptide products specified by such nucleotide 
sequences, including but not limited to soluble proteins and 
peptides in which all or a portion of the signal (or hydro- 
phobic transmembrane) sequence is deleted; (d) nucleotides 
that encode chimeric fusion proteins containing all or a 
portion of a coding region of an NHP, or one of its domains 
(e.g., a receptor or ligand binding domain, accessory protein/ 
self-association domain, etc.) fused to another peptide or 
polypeptide; or (e) therapeutic or diagnostic derivatives of 
the described polynucleotides such as oligonucleotides, anti- 
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sense polynucleotides, ribozymes, dsRNA, or gene therapy 
constructs comprising a sequence first disclosed in the 
Sequence Listing. 

As discussed above, the present invention includes: (a) 
the human DNA sequences presented in the Sequence List- 
ing (and vectors comprising the same) and additionally 
contemplates any nucleotide sequence encoding a contigu- 
ous NHP open reading frame (ORF) that hybridizes to a 
complement of a DNA sequence presented in the Sequence 
Listing under highly stringent conditions, e.g., hybridization 
to filter-bound DNA in 0.5 M NaHP0 4 , 7% sodium dodecyl 
sulfate (SDS), 1 mM EDTA at 65° C, and washing in 
0.1xSSC/0.1% SDS at 68° C. (Ausubel F. M. et al., eds„ 
1989, Current Protocols in Molecular Biology, Vol. I, Green 
Publishing Associates, Inc., and John Wiley & sons, Inc., 
New York, at p. 2.10.3) and encodes a functionally equiva- 
lent gene product. Additionally contemplated are any nucle- 
otide sequences that hybridize to the complement of a DNA 
sequence that encodes and expresses an amino acid 
sequence presented in the Sequence Listing under moder- 
ately stringent conditions, e.g., washing in 0.2xSSC/0.1% 
SDS at 42° C. (Ausubel et al., 1989, supra), yet still encodes 
a functionally equivalent NHP product. Functional equiva- 
lents of a NHP include naturally occurring NHPs present in 
other species and mutant NHPs whether naturally occurring 
or engineered (by site directed mutagenesis, gene shuffling, 
directed evolution as described in, for example, U.S. Pat. 
No. 5,837,458). The invention also includes degenerate 
nucleic acid variants of the disclosed NHP polynucleotide 
sequences. 

Additionally contemplated are polynucleotides encoding 
NHP ORFs, or their functional equivalents, encoded by 
polynucleotide sequences that are about 99, 95, 90, or about 
85 percent similar or identical to corresponding regions of 
the nucleotide sequences of the Sequence Listing (as mea- 
sured by BLAST sequence comparison analysis using, for 
example, the GCG sequence analysis package using stan- 
dard default settings). 

The invention also includes nucleic acid molecules, pref- 
erably DNA molecules, that hybridize to, and are therefore 
the complements of, the described NHP nucleotide 
sequences. Such hybridization conditions may be highly 
stringent or less highly stringent, as described above. In 
instances where the nucleic acid molecules are deoxyoligo- 
nucleotides ("DNA oligos"), such molecules are generally 
about 16 to about 100 bases long, or about 20 to about 80, 
or about 34 to about 45 bases long, or any variation or 
combination of sizes represented therein that incorporate a 
contiguous region of sequence first disclosed in the 
Sequence Listing. Such oligonucleotides can be used in 
conjunction with the polymerase chain reaction (PCR) to 
screen libraries, isolate clones, and prepare cloning and 
sequencing templates, etc. 

Alternatively, such NHP oligonucleotides can be used as 
hybridization probes for screening libraries, and assessing 
gene expression patterns (particularly using a micro array or 
high-throughput "chip" format). Additionally, a series of the 
described NHP oligonucleotide sequences, or the comple- 
ments thereof, can be used to represent all or a portion of the 
described NHP sequences. An oligonucleotide or polynucle- 
otide sequence first disclosed in at least a portion of one or 
more of the sequences of SEQ ID NOS: 1-11 can be used 
as a hybridization probe in conjunction with a solid support 
matrix/substrate (resins, beads, membranes, plastics, 
polymers, metal or metallized substrates, crystalline or poly- i 
crystalline substrates, etc.). Of particular note are spatially 
addressable arrays (i.e., gene chips, microliter plates, etc.) of 
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oligonucleotides and polynucleotides, or corresponding oli- 
gopeptides and polypeptides, wherein at least one of the 
biopolymers present on the spatially addressable array com- 
prises an oligonucleotide or polynucleotide sequence first 
5 disclosed in at least one of the sequences of SEQ ID NOS: 
1-11, or an amino acid sequence encoded thereby. Methods 
for attaching biopolymers to, or synthesizing biopolymers 
on, solid support matrices, and conducting binding studies 
thereon are disclosed in, inter alia, U.S. Pat. Nos. 5,700,637, 
5,556,752, 5,744,305, 4,631,211, 5,445,934, 5,252,743, 
4,713326, 5,424,186, and 4,689,405 the disclosures of 
which are herein incorporated by reference in their entirety. 

Addressable arrays comprising sequences first disclosed 
in SEQ ID NOS: 1-11 can be used to identify and charac- 
15 tcrize the temporal and tissue specific expression of a gene. 
These addressable arrays incorporate oligonucleotide 
sequences of sufficient length to confer the required 
specificity, yet be within the limitations of the production 
technology. The length of these probes is within a range of 
j Q between about 8 to about 2000 nucleotides. Preferably the 
probes consist of 60 nucleotides and more preferably 25 
nucleotides from the sequences first disclosed in SEQ ID 
NOS:l-ll. 

For example, a series of the described oligonucleotide 
, 5 sequences, or the complements thereof, can be used in chip 
format to represent all or a portion of the described 
sequences. The oligonucleotides, typically between about 16 
to about 40 (or any whole number within the stated range) 
nucleotides in length can partially overlap each other and/or 
30 the sequence may be represented using oligonucleotides that 
do not overlap. Accordingly, the described polynucleotide 
sequences shall typically comprise at least about two or 
three distinct oligonucleotide sequences of at least about 8 
nucleotides in length that are each first disclosed in the 
35 described Sequence Listing. Such oligonucleotide 
sequences can begin at any nucleotide present within a 
sequence in the Sequence Listing and proceed in either a 
sense (S'-to-S') orientation vis-a-vis the described sequence 
or in an antisense orientation. 
40 Microarray-based analysis allows the discovery of broad 
patterns of genetic activity, providing new understanding of 
gene functions and generating novel and unexpected insight 
into transcriptional processes and biological mechanisms. 
The use of addressable arrays comprising sequences first 
45 disclosed in SEQ ID NOS: 1-11 provides detailed informa- 
tion about transcriptional changes involved in a specific 
pathway, potentially leading to the identification of novel 
components or gene functions that manifest themselves as 
novel phenotypes. 
50 Probes consisting of sequences first disclosed in SEQ ID 
NOS: 1—11 can also be used in the identification, selection 
and validation of novel molecular targets for drug discovery. 
The use of these unique sequences permits the direct con- 
firmation of drug targets and recognition of drug dependent 
55 changes in gene expression that are modulated through 
pathways distinct from the drugs intended target. These 
unique sequences therefore also have utility in defining and 
monitoring both drug action and toxicity. 
As an example of utility, the sequences first disclosed in 
60 SEQ ID NOS: 1-11 can be utilized in microarrays or other 
assay formats, to screen collections of genetic material from 
patients who have a particular medical condition. These 
investigations can also be carried out using the sequences 
first disclosed in SEQ ID NOS: 1-11 in silico and by 
65 comparing previously collected genetic databases and the 
disclosed sequences using computer software known to 
those in the art. 
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Thus the sequences first disclosed in SEQ ID NOS:l-ll 
can be used to identify mutations associated with a particular 
disease and also as a diagnostic or prognostic assay. 

Although the presently described sequences have been 
specifically described using nucleotide sequence, it should 
be appreciated that each of the sequences can uniquely be 
described using any of a wide variety of additional structural 
attributes, or combinations thereof. For example, a given 
sequence can be described by the net composition of the 
nucleotides present within a given region of the sequence in 
conjunction with the presence of one or more specific 
oligonucleotide sequence(s) first disclosed in the SEQ ID 
NOS: 1-11. Alternatively, a restriction map specifying the 
relative positions of restriction endonuclease digestion sites, 
or various palindromic or other specific oligonucleotide 
sequences can be used to structurally describe a given 
sequence. Such restriction maps, which are typically gener- 
ated by widely available computer programs (e.g. 1 , the Uni- 
versity of Wisconsin GCG sequence analysis package, 
SEQUENCHER 3.0, Gene Codes Corp., Ann Arbor, Mich., 
etc.), can optionally be used in conjunction with one or more 
discrete nucleotide sequence(s) present in the sequence that 
can be described by the relative position of the sequence 
relatve to one or more additional sequence(s) or one or more 
restriction sites present in the disclosed sequence. 

For oligonucleotide probes, highly stringent conditions 
may refer, e.g., to washing in 6xSSC/0.05% sodium pyro- 
phosphate at 37° C. (for 14-base oligos), 48° C. (for 17-base 
oligos), 55° C. (for 20-base oligos), and 60° C, (for 23-base 
oligos). These nucleic acid molecules may encode or act as 
NHP gene antisense molecules, useful, for example, in NHP 
gene regulation (for and/or as antisense primers in amplifi- 
cation reactions of NHP gene nucleic acid sequences). With 
respect to NHP gene regulation, such techniques can be used 
to regulate biological functions. Further, such sequences 
may be used as part of ribozyme and/or triple helix 
sequences that are also useful for NHP gene regulation. 

Inhibitory antisense or double stranded oligonucleotides 
can additionally comprise at least one modified base moiety 
which is selected from the group including but not limited to 
5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, 
hypoxanthine, xantine, 4- ace tylcy tosine , 
5-(carboxyhydroxylmethyl) uracil, 
5-carboxymethylaminomethyl-2-thiouridine, 
5-carboxymethylaminomethyluracil, dihydrouracil, beta-D- 
galactosylqueosine, inosine, N6-isopentenyladenine, 

1- methylguanine, 1-methylinosine, 2,2-dimethylguanine, 

2- methyladenine, 2-methylguanine, 3-methylcytosine, 
5-methylcytosine, N6-adenine, 7-methylguanine, 
5-methylaminomethyluracil, 5-methoxyaminomethyI-2- 
thiouracil, beta-D-mannosylqueosine, 
S'-methoxycarboxymethyluracil, 5-methoxyuracil, 
2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic 
acid (v), wybutoxosine, pseudouracil, queosine, 
2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 
4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid 
methylester, uracil-5-oxyacetic acid (v), 5-methyl-2- 
thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3) 
w, and 2,6-diaminopurine. 

The antisense oligonucleotide can also comprise at least 
one modified sugar moiety selected from the group includ- 
ing but not limited to arabinose, 2-fluoroarabinose, xylulose, 
and bexose. 

In yet another embodiment, the antisense oligonucleotide 
will comprise at least one modified phosphate backbone 
selected from the group consisting of a phosphorothioate, a 
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phosphorodithioate, a phosphoramidothioate, a 
phosphoramid ate, a phosphordi amidate, a 
methylphosphonate, an alkyl phosphotriester, and a formac- 
etal or analog thereof. 
5 In yet another embodiment, the antisense oligonucleotide 
is an a-anomeric oligonucleotide. An ct-anomeric oligo- 
nucleotide forms specific double -stranded hybrids with 
complementary RNA in which, contrary to the usual p-units, 
the strands run parallel to each other (Gautier et al, 1987, 
10 Nucl. Acids Res. 15:6625-6641). The oligonucleotide is a 
2'-0-raethylribonucleotide (Inoue et al., 1987, Nucl. Acids 
Res. 15:6131-6148), or a chimeric RNA-DNA analogue 
(Inoue et al., 1987, FEBS Lett. 215:327-330). Alternatively, 
double stranded RNA can be used to disrupt the expression 
J5 and function of a targeted NHP. 

Oligonucleotides of the invention can be synthesized by 
standard methods known in the art, e.g. by use of an 
automated DNA synthesizer (such as are commercially 
available from Biosearch, Applied Biosystems, etc.). As 
2Q examples, phosphorothioate oligonucleotides can be synthe- 
sized by the method of Stein et al. (1988, Nucl. Acids Res. 
16:3209), and methylphosphonate oligonucleotides can be 
prepared by use of controlled pore glass polymer supports 
(Sarin et al., 1988, Proc. Natl. Acad. Sci. U.S.A. 
25 85:7448-7451), etc. 

Low stringency conditions are well known to those of 
skill in the art, and will vary predictably depending on the 
specific organisms from which the library and the labeled 
sequences are derived. For guidance regarding such condi- 
30 tions see, for example, Sambrook et al, 1989, Molecular 
Cloning, A Laboratory Manual (and periodic updates 
thereof), Cold Springs Harbor Press, N.Y.; and Ausubel et 
al., 1989, Current Protocols in Molecular Biology, Green 
Publishing Associates and Wiley Interscience, N.Y. 
35 Alternatively, suitably labeled NHP nucleotide probes can 
be used to screen a human genomic library using appropri- 
ately stringent conditions or by PCR. The identification and 
characterization of human genomic clones is helpful for 
identifying polymorphisms (including, but not limited to, 
40 nucleotide repeats, microsatellite alleles, single nucleotide 
polymorphisms, or coding single nucleotide 
polymorphisms), determining the genomic structure of a 
given locus/allele, and designing diagnostic tests. For 
example, sequences derived from regions adjacent to the 
45 intron/exon boundaries of the human gene can be used to 
design primers for use in amplification assays to detect 
mutations within the exons, introns, splice sites (e.g., splice 
acceptor and/or donor sites), etc., that can be used in 
diagnostics and pharmacogenomics. 
50 Further, a NHP gene homolog can be isolated from 
nucleic acid from an organism of interest by performing 
PCR using two degenerate or "wobble" oligonucleotide 
primer pools ^designed on the basis of amino acid sequences 
within the NHP products disclosed herein. The template for 
55 the reaction may be total RNA, mRNA, and/or cDNA 
obtained by reverse transcription of MRNA prepared from 
human or non-human cell lines or tissue known or suspected 
to express an allele of a NHP gene. 
The PCR product can be subcloned and sequenced to 
60 ensure that the amplified sequences represent the sequence 
of the desired NHP gene. The PCR fragment can then be 
used to isolate a fiill length cDNA clone by a variety of 
methods. For example, the amplified fragment can be 
labeled and used to screen a cDNA library, such as a 
65 bacteriophage cDNA library. Alternatively, the labeled frag- 
ment can be used to isolate genomic clones via the screening 
of a genomic library. 
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PCR technology can also be used to isolate full length The invention also encompasses (a) DNA vectors that 

cDNA sequences For example, RNA can be isolated, fol- contain any of the foregoing NHP coding sequences and/or 

lowing standard procedures, from an appropriate cellular or their complements (i.e., antisense); (b) DNA expression 

tissue source (i.e., one known, or suspected, to express a vectors that contain any of the foregoing NHP coding 

NHP gene). A reverse transcription (RT) reaction can be 5 sequences operatively associated with a regulatory element 

performed on the RNA using an oligonucleotide primer that directs the expression of the coding fences (for 

specific for the most 5* end of the amplified fragment for the example, baculo virus as described in US. Pat. No 5,869, 

priming of first strand synthesis. The resulting RNA/DNA 336 herein incorporated by reference); (c) genetically engi- 

hybrid may then be "tailed" using a standard terminal neered host cells that contain any of the foregoing NHP 

transferase reaction, the hybrid may be digested with RNase 1Q coding sequences operatively associated with a regulatory 

H, and second strand synthesis may then be primed with a element that directs the expression of the coding sequences 

complementary primer. Thus, cDNA sequences upstream of ^ t he host cell; and (d) genetically engineered host cells that 

the amplified fragment can be isolated. For a review of express an endogenous NHP gene under the control of an 

cloning strategies that can be used, see e.g., Sambrook et al., exogenously introduced regulatory element (i.e., gene 

1989, supra. activation). As used herein, regulatory elements include, but 

AcDNAencoding a mutant NHP gene can be isolated, for are not limited to, inducible and non-inducible promoters, 

example, by using PCR. In this case, the first cDNA strand enhancers, operators and other elements known to those 

may be synthesized by hybridizing an oligo-dT oligonucle- s^Hed in the art that drive and regulate expression. Such 

otide to mRNA isolated from tissue known or suspected to regu i at ory elements include but are not limited to the 

be expressed in an individual putatively carrying a mutant cytomegalovirus (hCMV) immediate early gene, 

NHP allele, and by extending the new strand with reverse 20 T ' M yiral elements (particularly retroviral LTR 

transenptase. The second strand of the cD NA « ^hen syn- * mQ{QTs) the earl or , ate promoters of S V40 adenovirus, 

thesized using an oligonucleotide that hybridizes ; spec£ P * ' * JAQ ^ TOC 

callv to the 5 f end of the normal gene. Using these two lut taXr 8 { ', * / J . . f . M 

primers, the product is then amplififd via PCR, optionally system, the major operator an I promoter regions 0 f phage 

cloned into a suitable vector? and subjected to DNA 25 am ^^f ^ 

sequence analysis through methods well known to those of for 3-phosphoglycerate kinase (PGK) the promoters of acid 

skill in the art. By comparing the DNA sequence of the phosphatase, and the promoters of the yeast a-matmg fac- 

mutant NHP allele to that of a corresponding normal NHP tors. 

allele, the mutation(s) responsible for the loss or alteration The present invention also encompasses antibodies and 

of function of the mutant NHP gene product can be ascer- 30 anti-idiotypic antibodies (including Fab fragments), antago- 

t a i n ed. nists and agonists of the NHP, as well as compounds or 

Alternatively, a genomic library can be constructed using nucleotide constructs that inhibit expression of a NHP gene 

DNA obtained from an individual suspected of or known to . (transcription factor inhibitors, antisense and ribozyme 

carry a mutant NHP allele (e.g., a person manifesting a molecules, or gene or regulatory sequence replacement 

NHP-associated phenotype such as, for example, obesity, 35 constructs), or promote the expression of a NHP (e.g., 

high blood pressure, connective tissue disorders, infertility, expression constructs in which NHP coding sequences are 

etc.), or a cDNA library can be constructed using RNA from operatively associated with expression control elements 

a tissue known, or suspected, to express a mutant NHP such as promoters, promoter/enhancers, etc.). 

allele. A normal NHP gene, or any suitable fragment thereof, The NHPs or NHP peptides, NHP fusion proteins, NHP 

can then be labeled and used as a probe to identify the 40 nucleotide sequences, antibodies, antagonists and agonists 

corresponding mutant NHP allele in such libraries. Clones can be useful for the detection of mutant NHPs or inappro- 

containing mutant NHP gene sequences can then be purified priately expressed NHPs for the diagnosis of disease. The 

and subjected to sequence analysis according to methods NHP proteins or peptides, NHP fusion proteins, NHP nucle- 

well known to those skilled in the art. otide sequences, host cell expression systems, antibodies, 

Additionally, an expression library can be constructed 45 antagonists, agonists and genetically engineered cells and 

utilizing cDNA synthesized from, for example, RNA iso- animals can be used for screening for drugs (or high 

lated from a tissue known, or suspected, to express a mutant throughput screening of combinatorial libraries) effective in 

NHP allele in an individual suspected of or known to carry the treatment of the symptomatic or phenotypic manifesta- 

such a mutant allele. In this manner, gene products made by tions of perturbing the normal function of NHP in the body, 

the putatively mutant tissue can be expressed and screened 50 The use of engineered host cells and/or animals may offer an 

using standard antibody screening techniques in conjunction advantage in that such systems allow not only for the 

with antibodies raised against a normal NHP product, as identification of compounds that bind to the endogenous 

described below. (For screening techniques, see, for receptor for an NHP, but can also identify compounds that 

example, Harlow, E. and Lane, eds., 1988, "Antibodies: A trigger NHP-mediated activities or pathways. 

Laboratory Manual", Cold Spring Harbor Press, Cold Spring 55 Finally, the NHP products can be used as therapeutics. For 

Harbor, N.Y.). ^ example, soluble derivatives such as NHP peptides/domains 

Additionally, screening can be accomplished by screening corresponding to NHPs, NHP fusion protein products 

with labeled NHP fusion proteins, such as, for example, (especially NHP-Ig fusion proteins, i.e., fusions of a NHP, or 

alkaline phosphatase-NHP or NHP-alkaline phosphatase a domain of a NHP, to an IgFc), NHP antibodies and 

fusion proteins. In cases where a NHP mutation results in an 60 anti-idiotypic antibodies (including Fab fragments), antago- 

expressed gene product with altered function (e.g., as a nists or agonists (including compounds that modulate or act 

result of a missense or a frameshift mutation), polyclonal on downstream targets in a NHP-mediated pathway) can be 

antibodies to a NHP are likely to cross-react with a corre- used to directly treat diseases or disorders. For instance, the 

sponding mutant NHP gene product. Library clones detected administration of an effective amount of soluble NHP, or a 

via their reaction with such labeled antibodies can be 65 NHP-IgFc fusion protein or an anti-idiotypic antibody (or its 

purified and subjected to sequence analysis according to Fab) that mimics the NHP could activate or effectively 

methods well known in the art. antagonize the endogenous NHP receptor. Nucleotide con- 
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structs encoding such NHP products can be used to geneti- reagents useful in the therapeutic treatment of mental, 

cally engineer host cells to express such products in vivo; biological, or medical disorders and diseases. Given the 

these genetically engineered cells function as "bioreaclors" similarity information and expression data, the described 

in the body delivering a continuous supply of a NHP, a NHP NHPs can be targeted (by drugs oligos, antibodies, etc,) in 

peptide, or a NHP fusion protein to the body. Nucleotide 5 order to treat d>sease or to therapeutically augment the 

constructs encoding functional NHPs, mutant NHPs, as well <f of, for example, chemothcrapeut.c agents used in 

as antisense and rftozyme molecules can also be used in the treatment of breast or prostate cancer, 
"gene therapy" approaches for the modulation of NHP ^^^^^T?^KS 
expression. Thus, the invention also encompasses pbarma- by the described NHP polynucleotides. The NHPs 

cemical formulations and methods for treating biological 10 «yP'"«y display have initiator methionines in DNA 

disorders sequence contexts consistent with a translation initiation 

Prions aspects of the invention are described in greater ^ ^ ^ ^ ccs of me indude 

detail in the subsections below. ^ aminQ add presented in the Sequence Listing 

The NHP Sequences 15 as we ^ as ana ^°S ues an< * derivatives thereof. Further, cor- 

responding NHP homologues from other species are encom- 

The cDNA sequences and the corresponding deduced passed by the invention. In fact, any NHP protein encoded 

amino acid sequences of the described NHPs are presented by the NHP nucleotide sequences described above are within 

in the Sequence Listing. The NHP nucleotides were obtained the scope of the invention, as are any novel polynucleotide 

from clustered human gene trapped sequences, testis and 20 sequences encoding all or any novel portion of an amino 

mammary transcript RACE products, ESTs, and human acid sequence presented in the Sequence Listing. The degen- 

brain, testis, trachea, pituitary, thymus, and mammary gland erate nature of the genetic code is well known, and, 

cDNA libraries (Edge Biosystems, Gaithersburg, Md.). accordingly, each amino acid presented in the Sequence 

SEQ ID NOS:l-5 describe sequences that are similar to Listing, is generically representative of the well known 

eucaryotic ATP-driven ion pumps such as calcium transport- 2 5 nucleic acid "triplet" codon, or in many cases codons, that 

ing ATPases, and which can be found expressed in a variety can encode the amino acid. As such, as contemplated herein, 

of human cells and tissues. The described sequences were the amino acid sequences presented in the Sequence Listing, 

assembled using gene trapped sequences and clones isolated when taken together with the genetic code (see, for example, 

from human kidney, lymph node, and thymus cDNAlibrar- Table 4-1 at page 109 of "Molecular Cell Biology", 1986, J. 

ies (Edge Biosystems, Gaithersburg, Md.). 30 Darnell et al. eds., Scientific American Books, New York, 

SEQ ID NOS:6-8 describe sequences that are similar to, N.Y, incorporated by reference) are generically rep- 
inter alia, sulfate transporter and cotransporter proteins, and tentative of all the various permutations and combinations 
can be found expressed in human bone marrow and testis. of nucleic acid that can encode such amino acid 
Several polymorphisms were found in this NHP including, sequences. 

but not limited to, possible A-to-G transitions at nucleotide 35 The invention also encompasses proteins that are func- 

positions corresponding to nucleotides 589, 692, 917, 1,164, Anally equivalent to the NHPs encoded by the presently 

and 2,390 of, for example SEQ ID NO:8 which be silent or described nucleotide sequences as judged by any of a 

can result in the met corresponding to amino acid position 73 number of criteria, including, but not limited to, the ability 

of SEQ ID NO:7 converting to a val (e.g., met 73 converting to bind and cleave a substrate of a NHP, or the abihty to 

to val 73), val 148 converting to ile, asn 230 converting to 40 effect an identical or complementary downstream pathway, 

lys, ile 562 converting to val. An additional C-to-T transition or a change in cellular metabolism (e.g., proteolytic activity, 

was identified that converts ala 777 to val. SEQ ID NOS:6-8 ion flux, tyrosine phosphorylation, etc.). Such functionally 

can be expressed in bone marrow and predominantly in equivalent NHP proteins include, but are not limited to, 

testis cells. These NHPs were assembled from gene trapped additions or substitutions of amino acid residues within the 

sequences and clones from a human testis cDNA library 45 amino acid sequence encoded by the NHP nucleotide 

(Edge Biosystems, Gaithersburg, Md.). sequences described above, but which result in a silent 

tt-v .tap n -I -< j *i_ .u * * *i ♦ change, thus producing a functionally equivalent gene prod- 

SEQ ID NOS:9-ll describe sequences that are similar to «^ ^ mJ J^ & ^ de 0Q ^ „i of 

zinc transporters and vesicular transporters can be found . q ^ hydrophobicity, 

expressed in, inter alia, placenta and adrenal gland, and these ^ y, and/or the amphipathic nature of the residues 

NHP sequences were assembled using gene t apped involved. For example, nonpolar (hydrophobic) amino acids 
sequences and clones from human adrenal and placenta e £ isol v eud vali H 

cDNA libraries (Edge Biosystems, Gaithersburg, Md.).. phenylalanine) tryptophan, and methionine; polar neutral 

Transporters and transporter related multidrug resistance aminQ adds - include glycine, serine, threonine, cysteine, 

(MDR) sequences, as well as uses and applications that are 55 tyrosine> asparagine, and glutamine; positively charged 
germane to the described NHPs, are described in U.S. Pat. aminQ acids i^de arginine, lysine, and histidine; 

Nos. 5,198,344 and 5,866,699 which are herein incorporated and negatively charged (acidic) amino acids include aspartic 

by reference in their entirety. acid ^ g i uta mic acid. 

. nmo j xn in n i » • j A variety of host-expression vector systems can be used 

NHPS and NHP Polypeptides 6o tQ expres / the Jdeotide of me inveDtion . 

NHPs, polypeptides, peptide fragments, mutated, Where, as in the present instance, the NHP peptide or 
truncated, or deleted forms of the NHPs, and/or NHP fusion polypeptide is thought to be membrane protein, the hydro- 
proteins can be prepared for a variety of uses. These uses phobic regions of the protein can be excised and the result- 
include but are not limited to the generation of antibodies, as ing soluble peptide or polypeptide can be recovered from the 
reagents in diagnostic assays, the identification of other 65 culture media. Such expression systems also encompass 
cellular gene products related to a NHP, as reagents in assays engineered host cells that express a NHP, or functional 
for screening for compounds that can be as pharmaceutical equivalent, in situ. Purification or enrichment of a NHP from 
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such expression systems can be accomplished using appro- is used as an expression vector, the NHP nucleotide 
priate detergents and lipid micelles and methods well known sequence of interest may be ligated to an adenovirus 
to those skilled in the art. However, such engineered host transcription/translation control complex, e.g., the late pro- 
cells themselves may be used in situations where it is moter and tripartite leader sequence. This chimeric gene 
important not only to retain the structural and functional 5 may then be inserted in the adenovirus genome by in vitro 
characteristics of the NHP, but to assess biological activity, or in vivo recombination. Insertion in a non-essential region 
e.g., in drug screening assays. of the viral genome (e.g., region El or E3) will result in a 

The expression systems that may be used for purposes of recombinant virus that is viable and capable of expressing a 

the invention include but are not limited to microorganisms NHP product in infected hosts (e.g., See Logan & Shenk, 

such as bacteria (e.g.,£. coli, B. subtilis) transformed with 10 1984, Proc. Natl. Acad. Sci. USA 81:3655-3659). Specific 

recombinant bacteriophage DNA, plasmid DNA or cosmid initiation signals may also be required for efficient transla- 

DNA expression vectors containing NHP nucleotide tion of inserted NHP nucleotide sequences. These signals 

sequences; yeast (e.g., Saccharomyces, Pichia) transformed include the ATG initiation codon and adjacent sequences. In 

with recombinant yeast expression vectors containing NHP cases where an entire NHP gene or cDNA, including its own 

nucleotide sequences; insect cell systems infected with 15 initiation codon and adjacent sequences, is inserted into the 

recombinant virus expression vectors (e.g., baculovirus) appropriate expression vector, no additional tra relational 

containing NHP sequences; plant cell systems infected with control signals may be needed. However, in cases where 

recombinant virus expression vectors (e.g., cauliflower only a portion of a NHP coding sequence is inserted, 

mosaic vims, CaMV; tobacco mosaic virus, TMV) or trans- exogenous translational control signals, including, perhaps, 

formed with recombinant plasmid expression vectors (e.g., 20 the ATG initiation codon, must be provided. Furthermore, 

Ti plasmid) containing NHP nucleotide sequences; or mam- the initiation codon must be in phase with the reading frame 

malian cell systems (e.g., COS, CHO, BHK, 293, 3T3) of the desired coding sequence to ensure translation of the 

harboring recombinant expression constructs containing entire insert. These exogenous translational control signals 

promoters derived from the genome of mammalian cells and initiation codons can be of a variety of origins, both 

(e.g., metallothionein promoter) or from mammalian viruses 25 natural and synthetic. The efficiency of expression may be 

(e.g., the adenovirus late promoter; the vaccinia virus 7.5K enhanced by the inclusion of appropriate transcription 

promoter). enhancer elements, transcription terminators, etc. (See Bitt- 

In bacterial systems, a number of expression vectors may ner et al., 1987, Methods in Enzymol. 153:516-544). 
be advantageously selected depending upon the use intended In addition, a host cell strain may be chosen that modu- 
for the NHP product being expressed. For example, when a 30 lates the expression of the inserted sequences, or modifies 
large quantity of such a protein is to be produced for the and processes the gene product in the specific fashion 
generation of pharmaceutical compositions of or containing desired. Such modifications (e.g., glycosylation) and pro- 
NHP, or for raising antibodies to a NHP, vectors that direct cessing (e.g., cleavage) of protein products may be impor- 
the expression of high levels of fusion protein products that tant for the function of the protein. Different host cells have 
are readily purified may be desirable. Such vectors include, 35 characteristic and specific mechanisms for the post- 
but are not limited, to the E. coli expression vector pUR278 translational processing and modification of proteins and 
(Ruther et al., 1983, EMBO J. 2:1791), in which a NHP gene products. Appropriate cell lines or host systems can be 
coding sequence may be ligated individually into the vector chosen to ensure the correct modification and processing of 
in frame with the lacZ coding region so that a fusion protein the foreign protein expressed. To this end, eukaryotic host 
is produced; pIN vectors (Inouye & Inouye, 1985, Nucleic 40 cells which possess the cellular machinery for proper pro- 
Acids Res. 13:3101-3109; Van Heeke & Schuster, 1989, J. cessing of the primary transcript, glycosylation, and phos- 
Biol. Chem. 264:5503-5509); and the like. pGEX vectors phorylation of the gene product may be used. Such mam- 
(Pharmacia or American Type Culture Collection) can also malian host cells include, but are not limited to, CHO, 
be used to express foreign polypeptides as fusion proteins VERO, BHK, HeLa, COS, MDCK, 293, 3T3, WI38, and in 
with glutathione S-transferase (GST). In general, such 45 particular, human cell lines. 

fusion proteins are soluble and can easily be purified from For long-term, high-yield production of recombinant 

lysed cells by adsorption to glutathione-agarose beads fol- proteins, stable expression is preferred. For example, cell 

lowed by elution in the presence of free glutathione. The fines which stably express the NHP sequences described 

PGEX vectors are designed to include thrombin or factor Xa above can be engineered. Rather than using expression 

protease cleavage sites so that the cloned target gene product 50 vectors which contain viral origins of replication, host cells 

can be released from the GST moiety. can be transformed with DNA controlled by appropriate 

In an insect sysiem^Autographa californica nuclear poly- expression control elements (e.g., promoter, enhancer 

hidrosis virus (AcNPV) is used as a vector to express foreign sequences, transcription terminators, polyadenylation sites, 

genes. The virus grows in Spodoptera frugiperda cells. A etc.), and a selectable marker. Following the introduction of 

NHP coding sequence may be cloned individually into 55 the foreign DNA, engineered cells may be allowed to grow 

non-essential regions (for example the polyhedrin gene) of for 1-2 days in an enriched media, and then are switched to 

the virus and placed under control of an AcNPV promoter a selective media. The selectable marker in the recombinant 

(for example the polyhedrin promoter). Successful insertion plasmid confers resistance to the selection and allows cells 

of NHP coding sequence will result in inactivation of the to stably integrate the plasmid into their chromosomes and 

polyhedrin gene and production of non-occluded recombi- eo grow to form foci which in turn can be cloned and expanded 

nant virus (i.e., virus lacking the proteinaceous coat coded into cell lines. This method may advantageously be used to 

for by the polyhedrin gene). These recombinant viruses are engineer cell lines which express the NHP product. Such 

then used to infect Spodoptera frugiperda cells in which the engineered cell lines may be particularly useful in screening 

inserted sequence is expressed (e.g., see Smith et al., 1983, and evaluation of compounds that affect the endogenous 

J. Virol. 46:584; Smith, U.S. Pat. No. 4,215,051). 65 activity of the NHP product. 

In mammalian host cells, a number of viral-based expres- A number of selection systems may be used, including but 

sion systems may be utilized. In cases where an adenovirus not limited to the herpes simplex virus thymidine kinase 
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(Wigler, et al, 1977, Oil 11:223), bypoxanthine-guanine therefore, be utilized as part of a diagnostic or prognostic 

phosphoribosyltransferase (Szybalska & Szybalski, 1962, technique whereby patients may be tested for abnormal 

Proc Natl Acad. Sci. USA 48:2026), and adenine phospho- amounts of NHP. Such antibodies may also be utilized in 

ribosyltransferase (Lowy, et al., 1980, Cell 22:817) genes conjunction with, for example, compound screening 

can be employed in Or, hgprt" or aprt" cells, respectively. 5 schemes for the evaluation of the effect of test compounds 

Also, antimetabolite resistance can be used as the basis of °n expression and/or activity of a NHP gene product, 

selection for the following genes: dhfr, which confers resis- Additionally, such antibodies can be used « ~n — 

tance to methotrexate (Wigler, et al., 1980, Natl. Acad. Sci. tnerapyto for ^Iw^^Z 

USA 77:3567; O'Hare, et al., 1981, Proc. Natl. Acad. Sci. engineered NHP-expressing .cells pnor to their introducUon 

USA 78:1527); gpt, which confers resistance to mycophe- ,o ™* V*™ 1 - Such antibodies jmay add^onally be used as 

nolic acid (MulligVn & Berg, 1981, Proc. Natl. Acad. Sci. a method for the inhibition of abnormal NHP activity. Thus, 

USA 78:2072); neo, which confers resistance to the ami- such ant.bod.es may, therefore, be utilized as part of treat- 

noglycoside G-418 (Colberre-Garapin, et al., 1981, J. Mol. ment methods. 

Biol. 150:1); and hygro, which confers resistance to hygro- For the production of antibodies, vanous host animals 

mycin (Santerre, et al., 1984, Gene 30:147). is may be immunized by injection with a NHP, an ^PpepUde 

Alternatively, any fusion protein can be readily purified (•*. ™ ^pond w £ » functional dom.ua o an NHP), 

by utilizing an antibody specific for the fusion protein being truncated NHP polypeptides <£fflP in which one or more 

expressed. For example, a system described by Janknechtet domams have been de eted), functional equivalents of the 

al allows for the ready purification of non-denatured fusion NHP or mutated variant of he NHP. Such host anima s may 

proteins expressed in human cell lines (Janknecht, et al, 20 delude but are not hmited to pigs, rabbits, mice goats, and 

1991, Proc Natl. Acad. Sci. USA 88:8972-8976). In this rats, » name but a few. Vanous adjuvants may be used to 

system, the gene of interest is subcloned into a vaccinia increase the immunological response, depending on the bos 

Combination plasmid such that the gene's open reading spec.es, including but not limited to Freund s adjuvant 

frame is translationaUy fused to an amino-terminal tag (complete and incomplete), mineral salts such as aluminum 

consisting of six histidine residues. Extracts from cells 25 hydroxide or aluminum phosphate, surface act.ve substances 

infected tith recombinant vaccinia vims are loaded onto such as lysolecithin, pluromc polyols, polyamons, peptides 

Ni^nitriloacetic acid-agarose columns and histidine-tagged oil emulsions, and potentially useful human adjuvants such 

proteins are selectively eluted with imidazole-containing as BCG (bacille Catoette-Guerm) and Corynebactenum 

buffers parvum. Alternatively, the immune response could be 

A ' - . ■ • , . c. , n enhanced by combination and or coupling with molecules 

Also encompassed by the present invention are fusion 30 J . \ ° . . , j-„*u. 

. • * j * xrVyn. ♦ ♦ nn Ain f«ri\;t*t* such as keyhole limpet hemocyamn, tetanus toxoid, dipthe- 

^^^iuc^fhcl^^U^o^va^b^ toxoid, ovalbumin, cholera toxin or fragments thereof, 

transport across .the membrane into .the f^^^. Polyclonal antibodies are heterogeneous populations of anti- 

ofNHPs to antibody molecules or theu Tab fragment could * derived from The sera of the immunized 

be used to target cells bearing a particular epitope. Attaching . ' 

the appropriate signal sequence to the NHP would also 3 s anima .... , 

transport The NHP to thedesired location within the cell. . Monoclonal antibodies, which are homogeneous popula- 

Alternatively targeting of NHP or its nucleic acid sequence *ons of antibodies to a particular antigen can be obtained by 

might be achieved using liposome or lipid complex based any _ technique which provides for the production of antibody 

delivery systems. Such Technologies are described in Lipo- molecules by continuous ceU lines in culture. These mclude, 

semes* Practical Approach, New,RRC ed., Oxford Urn- «0 but are no limited to, the hybndoraa technique of KoWer 

versity Press, New £rk and in U.S. Pat. Nos. 4,594,595, and Milste.n (1975, Nafcre 256:495-497; and U.S. Pat. No. 

5,459;i27, 5,948,767 and 6,110,490 and their respective human B-ceU hybndoma techmque (Kosbor 

disclosures which are herein incorporated by reference in et al., 1983 Immunology Today ^|; Cole et al 1983, 

their entirety. Additionally embodied are novel protein con- P'°c Natl. Acad. Sc, USA 80:2026-2030) ^ and the EBV- 

structs engineered in such a way that they facUitate transport 45 hybndoma technique (Cole et al. 1985, Monoclonal Ant - 

of the NHP to the target site or desired organ. This goal may bodies And Cancer Therapy. Alan R. Liss, Inc., pp 77-96). 

be achieved by coupling of the NHP to a cytokine or other Such antibodies may be, of any immunoglobulin class 

ligand that provides flatting specificity, and/or to a protein E^^O. IgD and any subclass thereof. 

tr:r^ucmgdomain(selgen!rallyU.S.applicationSer.Nos. The bybridoma producing the mAbof his invention maybe 

60/111,701 and 60/056,713, both of which are herein incor- 50 cultivated in vitro or in vivo. Production of high titers of 

porated by reference, for examples of such transducing mAbs in vivo makes this the presently preferred method of 

sequences) to facilitate passage across cellular membranes if production. 

needed and can optionally be engineered to include nuclear In addition, techniques developed for the production of 

localization sequences when desired. "chimeric antibodies" (Momson et al., 1984 | Proc Natl. 

55 Acad. Sci, 81:6851-6855; Neuberger et al, 1984, Nature, 

Anitbodies to NHP Products 312:604-608; Takeda et al, 1985, Nature, 314:452-454) by 

Antibodies that specifically recognize one or more splicing the genes from a mouse antibody molecule of 

epitopes of a NHP, or epitopes of conserved variants of a appropriate antigen specificity together with genes from a 

NHP, or peptide fragments of a NHP are also encompassed human antibody molecule of appropriate biological activity 

by the invention. Such antibodies include but are not limited so can be used. A chimeric antibody is a molecule in which 

to polyclonal antibodies, monoclonal antibodies (mAbs), different portions are derived from different animal species, 

humanized or chimeric antibodies, single chain antibodies, . such as those having a variable region derived from a murine 

Fab fragments, F(ab") 2 fragments, fragments produced by a mAb and a human immunoglobulin constant region. Such 

Fab expression library, anti-idiotypic (anti-Id) antibodies, technologies are described in U.S. Pat. Nos. 6,075,181 and 

and epitope-binding fragments of any of the above. 65 5,877,397 and their respective disclosures which are herein 

The antibodies of the invention may be used, for example, incorporated by reference in their entirety. Also encom- 

in the detection of NHP in a biological sample and may, passed by the present invention is the use of fully humanized 
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monoclonal antibodies as described in U.S. Pat. No. 6,150, Antibodies to a NHP can, in turn, be utilized to generate 

584 and respective disclosures which are herein incorpo- anti-idiotype antibodies that "mimic" a given NHP, using 

rated by reference in their entirety. techniques well known tc > th osc . skilled 1 in the art (See, e.g 

At . , . • j Jf t . A t - f Greenspan & Bona, 1993, FASEB J 7(5):437-444; and 

Alternatively . techniques ^^^^^^ 5 Nissinoff, 1991, J. Immunol. 147(8):2429-2438). For 

single chain antibodies (U.S. Pat. No. 4,946,778, Bird, 1988, c le antibodies which bind to a NHP domain and 

Science 242:423-426; Huston et al., 1988, Proc. Natl. Acad. com petitively inhibit the binding of NHP to its cognate 

Sci. USA 85:5879-5883; and Ward et al., 1989, Nature recep t 0 r can be used to generate anti-idiotypes that "mimic" 

334:544-546) can be adapted to produce single chain anti- the andj therefore, bind and activate or neutralize a 

bodies against NHP gene products. Single chain antibodies receptor. Such anti-idiotypic antibodies or Fab fragments of 

are formed by linking the heavy and light chain fragments of 10 sucn anti-idiotypes can be used in therapeutic regimens 

the Fv region via an amino acid bridge, resulting in a single involving a NHP mediated pathway, 

chain polypeptide. The present invention is not to be limited in scope by the 

Antibody fragments which recognize specific epitopes specific embodiments described herein, which are intended 

may be generated by known techniques. For example, such as single illustrations of individual aspects of the invention, 

fragments include, but are not limited to: the F(ab') 2 frag- 15 and functionally equivalent methods and components are 

ments which can be produced by pepsin digestion of the within the scope of the invention. Indeed, various modifi- 

antibody molecule and the Fab fragments which can be cations of the invention, in addition to those shown and 

generated by reducing the disulfide bridges of the F(ab') 2 described herein will become apparent to those skilled in the 

fragments. Alternatively, Fab expression libraries may be art from the foregoing description. Such modifications are 

constructed (Huse et al., 1989, Science, 246:1275-1281) to 20 intended to fall within the scope of the appended claims. All 

allow rapid and easy identification of monoclonal Fab cited publications, patents, and patent applications are herein 

fragments with the desired specificity. incorporated by reference in their entirety. 



SEQUENCE LISTING 



<160> NUMBER OF SEQ ID NOS: 11 

<210> SEQ ID NO 1 

<211> LENGTH: 3534 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 1 



atgtggcgct 


ggatccggca 


gcagctgggt 


tttgacccac 


cacatcagag 


tgacacaaga 


60 


occatctacg 


tagccaacag 


gtttcctcag 


aatggccttt 


acacacctca 


gaaatttata 


120 


gataacagga 


tcatttcatc 


taagtacact 


gtgtggaatt 


ttgttccaaa 


aaatttattt 


180 


gaacagttca 


gaagagtggc 


aaacttttat 


tttcttatta 


tatttttggt 


tcagcttatg 


240 


attgatacac 


ctaccagtcc 


agttaccagt 


ggacttccat 


tattctttgt 


gataacagta 


300 


actgccataa 


agcagggata 


tgaagattgg 


ttacggcata 


actcagataa 


tgaagtaaat 


360 


ggagctcctg 


tttatgttgt 


tcgaagtggt 


ggccttgtaa 


aaactagatc 


aaaaaacatt 


420 


cgggtgggtg 


atattgttcg 


aatagccaaa 


gatgaaattt 


ttcctgcaga 


cttggtgctt 


480 


ctgtcctcag 


atcgactgga 


tggttcctgt 


cacgttacaa 


ctgctagttt 


ggacggagaa 


540 


actaacctga 


agacacatgt 


ggcagttcca 


gaaacagcat 


tattacaaac 


agttgccaat 


600 


ttggacactc 


tagtagctgt 


aatagaatgc 


cagcaaccag 


aagcagactt 


atacagattc 


660 


atgggacgaa 


tgatcataac 


ccaacaaatg 


gaagaaattg 


taagacctct 


ggggccggag~ * 


720 


agtctcctgc 


ttcgtggagc 


cagattaaaa 


aacacaaaag 


aaatttttgg 


tgttgcggta 


780 


tacactggaa 


tggaaactaa 


gatggcatta 


aattacaaga 


gcaaatcaca 


gaaacgatct 


840 


gcagtagaaa 


agtcaatgaa 


tacatttttg 


ataatttatc 


tagtaattc-t 


tatatctgaa 


900 


gctgtcatca 


gcactatctt 


gaagtataca 


tggcaagctg 


aagaaaaatg 


ggatgaacct 


960 


tggtataacc 


aaaaaacaga 


acatcaaaga 


aatagcagta 


agattctgag 


atttatttca 


1020 


gacttccttg 


cttttttggt 


tctctacaat 


ttcatcattc 


caatttcatt 


atatgtgaca 


1080 


gtcgaaatgc 


agaaatttct 


tggatcattt 


tttattggct 


gggatcttga 


tctgtatcat 


1140 
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-continued 



gaagaatcag 


die oyaaayw 


tcaagtcaat 


acttccgatc tgaatgaaga gcttggacag 


1200 


rr 4- jt it m n 4~ A 


t y v v wayo 


t aaaactggt 


acactgacag aaaatgagat gcagtttcgg 


1260 


gaatgttcaa 


f f * A 4 1 ttfTf* A^ 




gaaattaatg gtagacttgt acccgaagga 


1320 


ccaacaccag 




aggaaactta 


tcttatctta gtagtttatc ccatcttaac 


1380 


aacttatccc 


4- ^ 4- 4- A ^a Ar 




ttcagaacca gtcctgaaaa tgaaactgaa 


1440 




a flrwf oaf 


cttctttaaa 


gcagtcagtc tctgtcacac tgtacagatt 


1500 


agcaatgttc 


aaaLiW^aL u.y 


cactggtgat 


ggtccctggc aatccaacct ggcaccatcg 


1560 


cagttggagt 


jt ^«+- a^ ft r*A 4* 


ttcaccagat 


gaaaaggctc tagtagaagc tgctgcaagg 


1620 


attggtattg 


4- — 4- 4-4. A+4-rirr 


caattctgaa 


gaaactatgg aggttaaaac tcttggaaaa 


1680 


c&ggaacy g t. 


Cl^&ClClt* uy wl> 


tcatattctg 


gaatttgatt cagatcgtag gagaatgagt 


1740 




A ff/Tf'flPrf 


aggtgagaag 


ttattatttg ctaaaggagc tgagtcatca 


1800 


attctcccta 


A ilt^tfltfl^? 


t ggagaaata 


gaaaaaacca gaattcatgt agatgaattt 


1860 




CJCJC t AACJ A AC 


tctgtgtata 


gcatatagaa aatttacatc aaaagagtat 


1920 


gaggaaatag 


At A A AC At 


atttgaagcc 


aggactgcct tgcagcagcg ggaagagaaa 


1980 


ttggcagc tg 




catagagaaa 


gacctgatat tacttggagc cacagcagta 


2040 


gaagacagac 


t AC AA<JAt A A 


j« rT4~ c n n n 
an v uvyoy oa 


actattgaag cattgagaat ggctggtatc 


2100 


aaagtatggg 


4- ■ ^4" A r"+" rr n 
U w wQif 1* y y 


ggataaacat 


gaaacagctg ttagtgtgag tttatcatgt 


2160 


ggccattt tc 


At A ^ A A CCA t 




gaacttataa accagaaatc agacagcgag 


2220 


tgtgctgaac 


A&ttCf A(J9^^ 




agaattacag .aggatcatgt gattcagcat 


2280 


gggctggtag 


i*y y y a.w 


cagcctatct 


cttgcactca gggagcatga aaaactattt 


2340 


a t gg aa gt tt 


C A C[ AA Att^ 


ttcagctgta 


ttatgctgtc gtatggctcc actgcagaaa 


2400 


gcaaaagtaa 


t A A CJAC t A At 


aaaaatatca 


cctgagaaac ctataacatt ggctgttggt 


2460 


gatggtgcta 


a +* ei a<tt a a rr 
a cy Qv^ t«oy 


catgatacaa 


gaagcccatg ttggcatagg aatcatgggt 


2520 


aaagaaggaa 


y ay y y 1— 


aagaaacagt 


gactatgcaa tagccagatt taagttcctc 


2580 




+■ -1- 1-4- 4- 1 r a 


tggtcatttt 


tattatatta gaatagctac ccttgtacag 


2640 


+• m +■ +• 4- 4- 4- 4- 4* 4- 


At A CttJAflt ^t 


gtg ct ttatc 


acaccccagt ttttatatca gttctactgt 


2700 




A AAACAtt 


gtatgacagc 


gtgtacctga ctttatacaa tatttgtttt 


2760 


acttccctac 


C tLL Uy O W 


a t a t a gt c t t 


ttggaacagc atgtagaccc tcatgtgtta 


2820 


caaaataagc 


CCACCCttt A 


tcgagacatt 


agtaaaaacc gcctcttaag tattaaaaca 


2880 




CJCJ^CCAtCCt 




catgccttta ttttcttttt tggatcctat 


2940 


ttactaatag 


a AfT a4" A 

y y 00 ay ai.ai# 


atctCtgctt 


ggaaatggcc agatgttygg aaactggaca 


3000 




4* rrrr^r»^^f* Af* 
wy y 


a gt c a t g g "tt 


attacagtca cagtaaagat ggctctggaa 


■> 3060 


actcattttt 


ggacttggat 


caaccatctc 


gttacctggg gatctattat atttiattti 


3120 


gtattttcct 


tgttttatgg 


agggattctc 


tggccatttt tgggctccca gaatatgtat 


3180 


tttgtgttta 


ttcagctcct 


gtcaagtggt 


tctgcttggt ttgccataat cctcatggtt 


3240 


gttacatgtc 


tatttcttga 


tatcataaag 


aaggtctttg accgacacct ccaccctaca 


3300 


agtactgaaa 


aggcacagct 


tactgaaaca 


aatgcaggta tcaagtgctt ggactccatg 


3360 


tgctgtttcc 


cggaaggaga 


agcagcgtgt 


gcatctgttg gaagaatgct ggaacgagtt 


3420 


ataggaagat 


gtagtccaac 


ccacatcagc 


agatcatgga gtgcatcgga tcctttctat 


3480 


accaacgaca 


ggagcatctt 


gactctctcc 


acaatggact catctacttg ttaa 


3534 
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-continued 



<210> SEQ ID NO 2 

<211> LENGTH: 1177 

<212> TYPE: PRT 

<213> ORGANISM: homo .sapiens 

<400> SEQUENCE: 2 

Met Trp Arg Trp lie Arg Gin Gin Leu Gly Phe Asp Pro Pro Hio Gin 
15 10 15 

Ser Asp Thr Arg Thr lie Tyr Val Ala Asn Arg Phe Pro Gin Asn Gly 
20 25 30 

Leu Tyr Thr Pro Gin Lys Phe lie Asp Asn Arg He He Ser Ser Lys 
35 40 45 

Tyr Thr Val Trp Asn Phe Val Pro Lys Asn Leu Phe Glu Gin Phe Arg 
50 55 60 

Arg Val Ala Asn Phe Tyr Phe Leu He He Phe Leu Val Gin Leu Met 
65 70 75 80 

He Asp Thr Pro Thr Ser Pro Val Thr Ser Gly Leu Pro Leu Phe Phe 
85 90 95 

Val He Thr Val Thr Ala He Lys Gin Gly Tyr Glu Asp Trp Leu Arg 
100 105 HO 

His Asn Ser Asp Asn Glu Val Asn Gly Ala Pro Val Tyr Val Val Arg 
115 120 125 

Ser Gly Gly Leu Val Lys Thr Arg Ser Lys Asn He Arg Val Gly Asp 
130 135 140 

He Val Arg lie Ala Lys Asp Glu He Phe Pro Ala Asp Leu Val Leu 
145 150 155 160 

Leu Ser Ser Asp Arg Leu Asp Gly Ser Cys His Val Thr Thr Ala Ser 
165 170 175 

Leu Asp Gly Glu Thr Asn Leu Lys Thr His Val Ala Val Pro Glu Thr 
180 185 190 

Ala Leu Leu Gin Thr Val Ala Asn Leu Asp Thr Leu Val Ala Val He 
195 200 205 

Glu Cys Gin Gin Pro Glu Ala Asp Leu Tyr Arg Phe Met Gly Arg Met 
210 215 220 

He He Thr Gin Gin Met Glu Glu He Val Arg Pro Leu Gly Pro Glu 
225 230 235 240 

Ser Leu Leu Leu Arg Gly Ala Arg Leu Lys Asn Thr Lys Glu He Phe 
245 250 255 

Gly Val Ala Val Tyr Thr Gly Met Glu Thr Lys Met Ala Leu Asn Tyr 
260 265 270 

Lys Ser Lys Ser Gin Lys Arg Ser Ala Val Glu Lys Ser Met Asn Thr 
275 280 285 

Phe Leu He He Tyr Leu Val He Leu He Ser Glu Ala Val lie Ser 
290 295 300 

Thr He Leu Lys Tyr Thr Trp Gin Ala Glu Glu Lys Trp Asp Glu Pro 
305 310 315 320 

Trp Tyr Asn Gin Lys Thr Glu His Gin Arg Asn Ser Ser Lys He Leu 
325 330 335 

Arg Phe He Ser Asp Phe Leu Ala Phe Leu Val Leu Tyr Asn Phe He 
340 345 350 

He Pro He Ser Leu Tyr Val Thr Val Glu Met Gin Lys Phe Leu Gly 
355 360 365 

Ser Phe Phe He Gly Trp Asp Leu Asp Leu Tyr Hie Glu Glu Ser Asp 
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-continued 



370 



375 



380 



Gin Lys Ala Gin Val Asn Thr Ser Asp Leu ABn Glu Glu Leu Gly Gin 
385 390 395 400 

Val Glu Tyr Val Phe Thr Asp Lys Thr Gly Thr Leu Thr Glu Asn Glu 
405 410 415 

Met Gin Phe Arg Glu Cys Ser lie Asn Gly Met Lys Tyr Gin Glu He 
420 425 430 

Asn Gly Arg Leu Val Pro Glu Gly Pro Thr Pro Asp Ser Ser Glu Gly 
435 440 445 

Asn Leu Ser Tyr Leu Ser Ser Leu Ser Hie Leu Asn Asn Leu Ser His 
450 455 460 

Leu Thr Thr Ser Ser Ser Phe Arg Thr Ser Pro Glu Asn Glu Thr Glu 
465 470 475 480 

Leu He Lys Glu His Asp Leu Phe Phe Lys Ala Val Ser Leu Cys His 
485 490 495 

Thr Val Gin He Ser Asn Val Gin Thr Asp Cys Thr Gly Asp Gly Pro 
500 505 510 

Trp Gin Ser Asn Leu Ala Pro Ser Gin Leu Glu Tyr Tyr Ala Ser Ser 
515 520 525 

Pro Asp Glu Lys Ala Leu Val Glu Ala Ala Ala Arg He Gly He Val 
530 535 540 

Phe He Gly Asn Ser Glu Glu Thr Met Glu Val Lys Thr Leu Gly Lys 
545 550 555 560 

Leu Glu Arg Tyr Lys Leu Leu His He Leu Glu Phe Asp Ser Asp Arg 



Phe Ala Lys Gly Ala Glu Ser Ser He Leu Pro Lys Cys He Gly Gly 
595 600 605 

Glu He Glu Lys Thr Arg He His Val Asp Glu Phe Ala Leu Lys Gly 
610 615 620 

Leu Arg Thr Leu Cys He Ala Tyr Arg Lys Phe Thr Ser Lys Glu Tyr 
625 630 635 640 

Glu Glu He Asp Lye Arg He Phe Glu Ala Arg Thr Ala Leu Gin Gin 
645 650 655 

Arg Glu Glu Lys Leu Ala Ala Val Phe Gin Phe lie Glu Lys Asp Leu 
660 665 670 

He Leu Leu Gly Ala Thr Ala Val Glu Asp Arg Leu Gin Asp Lys Val 
675 680 685 

Arg Glu Thr He Glu Ala Leu Arg Met Ala Gly He Lys Val Trp Val 
690 695 700 

Leu Thr Gly Asp Lys His Glu Thr Ala Val Ser Val Ser Leu Ser Cys 
705 710 715 720 

Gly His Phe His Arg Thr Met Asn He Leu Glu Leu He Asn Gin Lys 
725 730 735 

Ser Asp Ser Glu Cys Ala Glu Gin Leu Arg Gin Leu Ala Arg Arg He 
740 745 750 

Thr Glu Asp His Val He Gin His Gly Leu Val Val Asp Gly Thr Ser 
755 760 765 

Leu Ser Leu Ala Leu Arg Glu HiB Glu Lys Leu Phe Met Glu Val Cys 
770 775 780 



565 



570 



575 



Arg Arg Met Ser Val He Val Gin Ala Pro Ser Gly Glu Lys Leu Leu 
580 585 590 



Arg Asn Cys Ser Ala Val Leu Cys Cys Arg Met Ala Pro Leu Gin Lys 
785 790 795 800 
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-continued 



Ala Lys Val He Arg Leu He Lys He Ser Pro Glu Lys Pro He Thr 
805 810 815 

Leu Ala Val Gly Asp Gly Ala Asn Asp Val Ser Met He Gin Glu Ala 
820 .825 830 

His Val Gly He Gly He Met Gly Lys Glu Gly Arg Gin Ala Ala Arg 
835 840 845 

Asn Ser Asp Tyr Ala He Ala Arg Phe Lys Phe. Leu Ser Lys Leu Leu 
850 855 860 

Phe Val His Gly His Phe Tyr Tyr He Arg He Ala Thr Leu Val Gin 
B65 870 875 880 

Tyr Phe Phe Tyr Lys Asn Val Cys Phe He Thr Pro Gin Phe Leu Tyr 
885 890 895 

Gin Phe Tyr Cys Leu Phe Ser Gin Gin Thr Leu Tyr Asp Ser Val Tyr 
900 905 910 

Leu Thr Leu Tyr Asn He Cys Phe Thr Ser Leu Pro He Leu He Tyr 
915 920 925 

Ser Leu Leu Glu Gin His Val Asp Pro His Val Leu Gin Asn Lys Pro 
930 935 940 

Thr Leu Tyr Arg Asp He Ser Lys Asn Arg Leu Leu Ser He Lys Thr 
945 950 955 960 

Phe Leu Tyr Trp Thr He Leu Gly Phe Ser His Ala Phe He Phe Phe 
965 970 975 

Phe Gly Ser Tyr Leu Leu He Gly Lys Asp Thr Ser Leu Leu Gly Asn 
980 985 990 

Gly Gin Met Phe Gly Asn Trp Thr Phe Gly Thr Leu Val Phe Thr Val 
995 1000 1005 

Met Val He Thr Val Thr Val Lys Met Ala Leu Glu Thr His Phe Trp 
1010 1015 1020 

Thr Trp He Asn His Leu Val Thr Trp Gly Ser He He Phe Tyr Phe 
1025 1030 1035 1040 

Val Phe Ser Leu Phe Tyr Gly Gly He Leu Trp Pro Phe Leu Gly Ser 
1045 1050 1055 

Gin Asn Met Tyr Phe Val Phe He Gin Leu Leu Ser Ser Gly Ser Ala 
1060 1065 1070 

Trp Phe Ala He He Leu Met Val Val Thr Cys Leu Phe Leu Asp He 
1075 1080 1085 

He Lys Lys Val Phe Asp Arg His Leu Hie Pro Thr Ser Thr Glu Lys 
1090 1095 1100 

Ala Gin Leu Thr Glu Thr Asn Ala Gly He Lys Cys Leu Asp Ser Met 
1105 1110 1115 1120 

Cys Cys Phe Pro Glu Gly Glu Ala Ala Cys Ala Ser Val Gly Arg Met 



Leu Glu Arg Val He Gly Arg Cys Ser Pro Thr His He Ser Arg Ser 
1140 1145 1150 

Trp Ser Ala Ser Asp Pro Phe Tyr Thr Asn Asp Arg Ser He Leu Thr 
1155 1160 1165 

Leu Ser Thr Met Asp Ser Ser Thr Cys 



1125 



1130 



1135 



1170 



1175 



<210> SEQ ID NO 3 

<211> LENGTH : 1125 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 
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-continued 



<400> SEQUENCE: 3 

atgtggcgct ggatccggca gcagctgggt tttgacccac cacatcagag tgacacaaga 60 

accatctacg tagccaacag gtttcctcag aatggccttt acacacctca gaaatttata 120 

gataacagga tcatttcatc taagtacact gtgtggaatt ttgttccaaa aaatttattt 180 

gaacagttca gaagagtggc aaacttttat tttcttatta tatttttggt tcagcttatg 240 

attgatacac ctaccagtcc agttaccagt ggacttccat tattctttgt gataacagta 300 

actgccataa agcagggata tgaagattgg ttacggcata actcagataa tgaagtaaat 360 

ggagctcctg tttatgttgt tcgaagtggt ggccttgtaa aaactagatc aaaaaacatt 420 

cgggtgggtg atattgttcg aatagccaaa gatgaaattt ttcctgcaga cttggtgctt 480 

ctgtcctcag atcgactgga tggttcctgt cacgttacaa ctgctagttt ggacggagaa 540 

actaacctga agacacatgt ggcagttcca gaaacagcat tattacaaac agttgccaat 600 

ttggacactc tagtagctgt aatagaatgc cagcaaccag aagcagactt atacagattc 660 

atgggacgaa tgatcataac ccaacaaatg gaagaaattg taagacctct ggggccggag 720 

agtctcctgc ttcgtggagc cagattaaaa aacacaaaag aaatttttgg tgttgcggta 780 

tacactggaa tggaaactaa gatggcatta aattacaaga gcaaatcaca gaaacgatct 840 

gcagtagaaa agtcaatgaa tacatttttg ataatttatc tagtaattct tatatctgaa 900 

gctgtcatca gcactatctt gaagtataca tggcaagctg aagaaaaatg ggatgaacct 960 

tggtataacc aaaaaacaga acatcaaaga aatagcaatt ctgagattta tttcagactt 1020 

ccttgctttt ttggttctct acaatttcat cattccaatt tcattatatg tgacagtcga 1080 

aatgcagaaa tttcttggat cattttttat tggctgggat cttga 1125 



<210> SEQ ID NO 4 

<211> LENGTH: 374 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 4 

Met Trp Arg Trp lie Arg Gin Gin Leu Gly Phe Asp Pro Pro His Gin 
1 5 10 15 

Ser Aep Thr Arg Thr lie Tyr Val Ala Asn Arg Phe Pro Gin Asn Gly 
20 25 30 

Leu Tyr Thr Pro Gin Lys Phe He Asp Asn Arg He He Ser Ser Lys 
35 40 45 

Tyr Thr Val Trp Asn Phe Val Pro Lys Asn Leu Phe Glu Gin Phe Arg 
50 55 60 

Arg Val Ala Asn Phe Tyr Phe Leu He lie Phe Leu Val Gin Leu Met 
65 70 75 .80 

He Asp Thr Pro Thr Ser Pro Val Thr Ser Gly Leu Pro Leu Phe Phe 
85 90 95 

Val He Thr Val Thr Ala He Lys Gin Gly Tyr Glu Asp Trp Leu Arg 
100 105 110 

His Asn Ser Asp Asn Glu Val Asn Gly Ala Pro Val Tyr Val Val Arg 
115 120 125 

Ser Gly Gly Leu Val Lys Thr Arg Ser Lys Asn He Arg Val Gly Asp 
130 135 140 

He Val Arg He Ala LyB Asp Glu He Phe Pro Ala Asp Leu Val Leu 
145 150 155 160 

Leu Ser Ser Asp Arg Leu Asp Gly Ser Cys His Val Thr Thr Ala Ser 
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165 170 175 

Leu Asp Gly Glu Thr Asn Leu Lys Thr His Val Ala Val Pro Glu Thr 
180 185 190 

Ala Leu Leu Gin Thr Val Ala Asn Leu Asp Thr Leu Val Ala Val He 
195 200 205 

Glu Cys Gin Gin Pro Glu Ala Asp Leu Tyr Arg Phe Met Gly Arg Met 
210 215 220 

He He Thr Gin Gin Met Glu Glu He Val Arg Pro Leu Gly Pro Glu 
225 230 235 240 

Ser Leu Leu Leu Arg Gly Ala Arg Leu Lys Asn Thr Lys Glu lie Phe 
245 250 255 

Gly Val Ala Val Tyr Thr Gly Met Glu Thr Lys Met Ala Leu Asn Tyr 
260 265 270 

Lys Ser Lys Ser Gin Lys Arg Ser Ala Val Glu Lys Ser Met Asn Thr 
275 280 285 

Phe Leu He He Tyr Leu Val He Leu He Ser Glu Ala Val He Ser 
290 295 300 

Thr He Leu Lys Tyr Thr Trp Gin Ala Glu Glu Lys Trp Asp Glu Pro 
305 310 315 320 

Trp Tyr Asn Gin Lys Thr Glu His Gin Arg Asn Ser Asn Ser Glu He 
325 330 335 

Tyr Phe Arg Leu Pro Cys Phe Phe Gly Ser Leu Gin Phe His His Ser 
340 345 350 

Asn Phe He He Cys. Asp Ser Arg Asn Ala Glu He Ser Trp He He 
355 360 365 

Phe Tyr Trp Leu Gly Ser 
370 



<210> SEQ ID NO 5 

<211> LENGTH: 7277 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 5 

gccgcgggat gggaacgcgg cgcggggagt gaggcagtgg cggcggcggc ggtaagcgga 60 

acttcggccc gaggggctcg cccgctcccg cctctgtctt gtcggcctcc acctgcagcc 120 

ccgcggcccc cgcgccccgc gggacccgga cggcgacgac gggggaatgt ggcgctggat 180 

ccggcagcag ctgggttttg acccaccaca tcagagtgac acaagaacca tctacgtagc 24 0 

caacaggttt cctcagaatg gcctttacac acctcagaaa tttatagata acaggatcat 300 

ttcatctaag tacactgtgt ggaattttgt tccaaaaaat ttatttgaac agttcagaag 360 

agtggcaaac ttttattttc ttattatatt tttggttcag cttatgattg atacacctac 420 

cagtccagtt accagtggac ttccattatt ctttgtgata acagtaactg ccataaagca 480 

gggatatgaa gattggttac ggcataactc agataatgaa gtaaatggag ctcctgttta 540 

tgttgttcga agtggtggcc ttgtaaaaac tagatcaaaa aacattcggg tgggtgatat 600 

tgttcgaata gccaaagatg aaatttttcc tgcagacttg gtgcttctgt cctcagatcg 660 

actggatggt tcctgtcacg ttacaactgc tagtttggac ggagaaacta acctgaagac 720 

acatgtggca gttccagaaa cagcattatt acaaacagtt gccaatttgg acactctagt 780 

agctgtaata gaatgccagc aaccagaagc agacttatac agattcatgg gacgaatgat 840 

cataacccaa caaatggaag aaattgtaag acctctgggg ccggagagtc tcctgcttcg 900 
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tggagccaga 


ttaaaaaaca 


caaaagaaat 


ttttggtgtt 


gcggtataca 


ctggaatgga 


960 


aactaagatg 


gcattaaatt 


acaagagcaa 


atcacagaaa 


cgatctgcag 


tagaaaagtc 


1020 


aatgaataca 


tttttgataa 


tttatctagt 


aattcttata 


tctgaagctg 


tcatcagcac 


1080 


tatcttgaag 


tatacatggc 


aagctgaaga 


aaaatgggat 


gaaccttggt 


ataaccaaaa 


1140 


aacagaacat 


caaagaaata 


gcagtaagat 


tctgagattt 


atttcagact 


tccttgcttt 


1200 


tttggttctc 


tacaatttca 


tcattccaat 


ttcattatat 


gtgacagtcg 


aaatgcagaa 


1260 


atttcttgga 


tcatttttta 


ttggctggga 


tcttgatctg 


tatcatgaag 


aatcagatca 


1320 


gaaagctcaa 


gtcaatactt 


ccgatctgaa 


tgaagagctt 


ggacaggtag 


agtacgtgtt 


1380 


tacagataaa 


actggtacac 


tgacagaaaa 


tgagatgcag 


tttcgggaat 


gttcaattaa 


1440 


tggcatgaaa 


taccaagaaa 


ttaatggtag 


acttgtaccc 


gaaggaccaa 


caccagactc 


1500 


ttcagaagga 


aacttatctt 


atcttagtag 


tttatcccat 


cttaacaact 


tatcccatct 


1560 


tacaaccagt 


tcctctttca 


gaaccagtcc 


tgaaaatgaa 


actgaactaa 


ttaaagaaca 


1620 


tgatctcttc 


tttaaagcag 


tcagtctctg 


tcacactgta 


cagattagca 


atgttcaaac 


1680 


tgactgcact 


ggtgatggtc 


cctggcaatc 


caacctggca 


ccatcgcagt 


tggagtacta 


1740 


tgcatcttca 


ccagatgaaa 


aggctctagt 


agaagctgct 


gcaaggattg 


gtattgtgtt 


1800 


tattggcaat 


tctgaagaaa 


ctatggaggt 


taaaactctt 


ggaaaactgg 


aacggtacaa 


1860 


actgcttcat 


attctggaat 


ttgattcaga 


tcgtaggaga 


atgagtgtaa 


ttgttcaggc 


1920 


accttcaggt 


gagaagttat 


tatttgctaa 


aggagctgag 


tcatcaattc 


tccctaaatg 


1980 


tataggtgga 


gaaatagaaa 


aaaccagaat 


tcatgtagat 


gaatttgctt 


tgaaagggct 


2040 


aagaactctg 


tgtatagcat 


atagaaaatt 


tacatcaaaa 


gagtatgagg 


aaatagataa 


2 100 


acgcatattt 


gaagccagga 


ctgccttgca 


gcagcgggaa 


gagaaattgg 


cagctgtttt 


2160 


ccagttcata 


gagaaagacc 


tgatattact 


tggagccaca 


gcagtagaag 


acagactaca 


2220 


agataaagtt 


cgagaaacta 


ttgaagcatt 


gagaatggct 


ggtatcaaag 


tatgggtact 


2280 


tactggggat 


aaacatgaaa 


cagctgttag 


tgtgagttta 


tcatgtggcc 


attttcatag 


2340 


aaccatgaac 


atccttgaac 


ttataaacca 


gaaatcagac 


agcgagtgtg 


ctgaacaatt 


2400 


gaggcagctt 


gccagaagaa 


ttacagagga 


tcatgtgatt 


cagcatgggc 


tggtagtgga 


2460 


tgggaccagc 


ctatctcttg 


cactcaggga 


gcatgaaaaa 


ctatttatgg 


aagtttgcag 


2520 


aaattgttca 


gctgtattat 


gctgtcgtat 


ggctccactg 


cagaaagcaa 


aagtaataag 


2580 


actaataaaa 


atatcacctg 


agaaacctat 


aacattggct 


gttggtgatg 


gtgctaatga 


2640 


cgtaagcatg 


atacaagaag 


cccatgttgg 


cataggaatc 


atgggtaaag 


aaggaagaca 


2700 


ggctgcaaga 


aacagtgact 


atgcaatagc 


cagatttaag 


ttcctctcca 


aattgctttt 


2760 


tgttcatggt 


catttttatt 


atattagaat 


agctaccctt 


gtacagtatt 


ttttttataa 


^ 2820 


gaatgtgtgc 


tttatcacac 


cccagttttt 


atatcagttc 


tactgtttgt 


tttctcagca 


2880 


aacattgtat 


gacagcgtgt 


acctgacttt 


atacaatatt 


tgttttactt 


ccctacctat 


2940 


tctgatatat 


agtcttttgg 


aacagcatgt 


agaccctcat 


gtgttacaaa 


ataagcccac 


' 3000 


cctttatcga 


gacattagta 


aaaaccgcct 


cttaagtatt 


aaaacatttc 


tttattggac 


3060 


catcctgggc 


ttcagtcatg 


cctttatttt 


cttttttgga 


tcctatttac 


taatagggaa 


3120 


agatacatct 


ctgcttggaa 


atggccagat 


gttyggaaac 


tggacatttg 


gcactttggt 


3180 


cttcacagtc 


atggttatta 


cagtcacagt 


aaagatggct 


ctggaaactc 


atttttggac 


3240 


ttggatcaac 


catctcgtta 


cctggggatc 


tattatattt 


tattttgtat 


tttccttgtt 


3300 
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ttatggaggg 


attctctggc 


catttttggg 


ctcccagaat 


atgtattttg 


tgtttattca 


3360 


gctcctgtca 


agtggttctg 


cttggtttgc 


cataatcctc 


atggttgtta 


catgtctatt 


3420 


tcttgatatc 


ataaagaagg 


tctttgaccg 


acacctccac 


cctacaagta 


ctgaaaaggc 


3480 


acagcttact 


gaaacaaatg 


caggtatcaa 


gtgcttggac 


tccatgtgct 


gtttcccgga 


3540 


aggagaagca 


gcgtgtgcat 


ctgttggaag 


aatgctggaa 


cgagttatag 


gaagatgtag 


3600 


tccaacccac 


atcagcagat 


catggagtgc 


atcggatcct 


ttctatacca 


acgacaggag 


3660 


catcttgact 


ctctccacaa 


tggactcatc 


tacttgttaa 


aggggcagta 


gtactttgtg 


3720 


ggagccagtt 


cacctccttt 


cctaaaattc 


agtgtgatca 


ccctgttaat 


ggccacacta 


3780 


gctctgaaat 


taatttccaa 


aatctttgta 


gtagttcata 


cccactcaga 


gttataatgg 


3640 


caaacaaaca 


gaaagcatta 


gtacaagccc 


ctcccaacac 


ccttaatttg 


aatctgaaca 


3900 


tgttaaaatt 


tgagaataaa 


gagacatttt 


tcatctcttt 


gtctggtttg 


tcccttgtgc 


3960 


ttatgggact 


cctaatggca 


tttcagtctg 


ttgctgaggc 


cattatattt 


taatataaat 


4020 


gtagaaaaaa 


gagagaaatc 


ttagtaaaga 


gtatttttta 


gtattagctt 


gattattgac 


4080 


tcttctattt 


aaatctgctt 


ctgtaaatta 


tgctgaaagt 


ttgccttgag 


aactctattt 


4X40 


ttttattaga 


gttatattta 


aagcttttca 


tgggaaaagt 


taatgtgaat 


actgaggaat 


4200 


tttggtccct 


cagtgacctg 


tgttgttaat 


tcattaatgc 


attctgagtt 


cacagagcaa 


4260 


attaggagaa 


tcatttccaa 


ccattattta 


ctgcagtatg 


gggagtaaat 


ttataccaat 


4320 


tcctctaact 


gtactgtaac 


acagcctgta 


aagttagcca 


tataaatgca 


agggtatatc 


4380 


atatatacaa 


atcaggaatc 


aggtccgttc 


accgaacttc 


aaattgatgt 


ttactaatat 


4 440 


ttttgtgaca 


gagtataaag 


accctatagt 


gggtaaatta 


gatactatta 


gcatattatt 


4500 


aatttaatgt 


ctttatcatt 


ggatcttttg 


catgctttaa 


tctggttaac 


atatttaaat 


4560 


ttgctttttt 


tctctttacc 


tgaaggctct 


gtgtatagta 


tttcatgaca 


tcgttgtaca 


4620 


gtttaactat 


atcaataaaa 


agtttggaca 


gtatttaaat 


attgcaaata 


tgtttaatta 


4680 


tacaaatcag 


aatagtatgg 


gtaattaaat 


gaatacaaaa 


agaagagcct 


ctttctgcag 


4740 


ccgacttaga 


catgctcttc 


cctttctata 


agctagattt 


tagaataaag 


ggtttcagtt 


4800 


aataatctta 


ttttcaggtt 


atgtcatcta 


acttatagca 


aactaccaca 


atacagtgag 


4860 


ttctgccagt 


gtcccagtac 


aaggcatatt 


tcaggtgtgg 


ctgtggaatg 


taaaaatgct 


4920 


caacttgtat 


caggtaotgt 


tagcaataaa 


ttaaatgcta 


agaatgatta 


otcgggtaca 


4980 


tgttactgta 


attaactcat 


tgcacttcaa 


aacctaactt 


ccatcctgaa 


tttatcaagt 


5040 


agttcagtat 


tgtcatttgt 


ttttgtttta 


ttgaaaagta 


atgttgtctt 


aagatttaga 


5100 


agtgattatt 


agct'tgagaa 


ctattaccca 


gctctaagca 


aataatgatt 


gtatacatat 


5160 


taagataatg 


gttaaatgcg 


gttttaccaa 


gttttccctt 


gaaaatgtaa 


ttcctttatg 


5220 


gagatttatt 


gtgcagccct 


aagcttcctt 


cccatttcat 


gaatataagg 


cttctagaat 


5280 


tggactggca 


ggggaaagaa 


tggtagagac 


agaaattaag 


actttatcct 


tgtttgcttg 


5340 


taaactatta 


ttttcttgct 


aatgtaacat 


ttgtctgttc 


cagtgatgta 


aggatattaa 


5400 


gttattaagc 


taaatattaa 


ttttcaaaaa 


tagtccttct 


ttaacttaga 


tatttcatag 


5460 


ctggatttag 


gaagatctgt 


tattctggaa 


gtactaaaaa 


gaataataca 


acgtacaatg 


5520 


tctgcattca 


ctaattcatg 


ttccagaaga 


ggaaataatg 


aagatatact 


cagtagagta 


5580 


ctaggtggga 


ggatatggaa 


atttgctcat 


aaaatctctt 


ataaaacgtg 


catataacaa 


5640 



31 
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aatgacaccc 


agtaggcctg 


cattacattt acatgaccgt gtttatttgc catcaaataa 


5700 


actgagtact 


gacaccagac 


aaagactcca aagtcataaa atagcctatg accaactgca 


5760 


gcaagacagg 


aggtcagctc 


gcctataatg gtgcttaaag tgtgattgat gtaattttct 


5320 


gtactcacca 


tttgaagtta 


gttaaggaga actttatttt tttaaaaaaa gtaaatggca 


5880 


accactagtg 


tgctcatcct 


gaactgttac tccaaatcca ctccgttttt aaagcaaaat 


5940 


tatcttgtga 


ttttaagaaa 


agagttttct atttatttaa gaaag^aaca atgcQgttty 


6000 


caagctttca 


gtagttttct 


agtgctatat tcatcctgta aaactcttac tacgtaacca 


6060 


gtaatcacaa 


ggaaagtgtc 


ccctttgcat atttctttaa aattctttct ttggaaagta 


6120 


tgatgttgat 


aattaactta 


cccttatctg ccaaaaccag agcaaaatgc taaatacgtt 


6180 


attgctaatc 


agtggtctca 


aatcgatttg cctccctttg cctcgtctga gggctgtaag 


6240 


cctgaagata 


gtggcaagca 


ccaagtcagt ttccaaaatt gcccctcagc tgctttaagt 


6300 


gactcagcac 


cctgcctcag 


cttcagcagg cstaggctca ccctgggcgg agcaaagtat 


6360 


gggccaggga 


gaactacagc 


tacgaagacc tgctgtcgag ttgagaaaag gggagaattt 


6420 


atggtctgaa 


ttttctaact 


gtcctctttc ttgggtctaa agctcataat acacaaaggc 


6480 


ttccagacct 


gagccacacc 


caggccctat cctgaacagg agactaaaca gaggcaaatc 


6540 


aaccctagga 


aatacttgca 


ttctgcccta cggttagtac caggactgag gtcatttcta 


6 600 


ctggaaaaga 


ttgtgagatt 


gaacttatct gatcgcttga gactcctaat aggcaggagt 


6660 


caaggccact 


agaaaattga 


cagttaagag ccaaaagttt ttaaaatatg ctactctgaa 


6720 


aaatctcgtg 


aaggctgtag 


gaaaagggag aatcttccat gttggtgttt ttcctgtaaa 


6780 


gatcagtttg 


gggtatgata 


taagcaggta ttaataaaaa taacacacca aagagttacg 


6840 


taaaacatgt 


tttattaatt 


ttggtcccca cgtacagaca ttttatttct attttgaaat 


6900 


gagttatcta 


ttttcataaa 


agtaaaacac tattaaagtg ctgttttatg tgaaataact 


6960 


tgaatgttgt 


tcctataaaa 


aatagatcat aactcatgat atgtttgtaa tcatggtaat 


7020 


ttagattttt 


atgaggaatg 


agtatctgga aatattgtag caatacttgg tttaaaattt 


7080 


tggacctgag 


acactgtggc 


tgtctaatgt aatcctttaa aaattctctg cattgtcagt 


7140 


aaatgtagta 


tattattgta 


cagctactca taatttttta aagtttatga agttatattt 


7200 


atcaaataaa 


aactttccta 


tataattaaa aaaaaaaaaa aaaaaaaaaa aaaaaacaaa 


7260 


aaaaaaaaaa 


aaaaaaa 




7277 



<210> SEQ ID NO 6 

<211> LENGTH: 2913 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 6 

atggcacaac tagagaggag cgccatctct ggcttcagct ctaagtccag gcgaaactca 60 

ttcgcatatg atgttaagcg tgaagtatac aatgaggaga cctttcaaca ggaacacaaa 120 

aggaaggcct cctcttctgg gaacatgaac atcaacatca ccaccttcag acaccacgtc 180 

cagtgccgct gctcatggca caggttccta cgatgcrtgc ttacaatctt tcccttccta 240 

gaatggatgt gtatgtatcg attaaaggat tggcttctgg gagacttact tgctggtata 300 

agtgttggcc ttgtgcaagt tccccaaggc ctgacactta gtttgctggc aaggcaactg 360 

attcctcctc tcaacatcgc ttatgcagct ttctgttctt cggtaatcta tgtaattttt 420 

ggatcgtgtc atcaaatgtc cgttggttcc ttcttcctgg tgagtgctct gctgatcaac 480 
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540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 



gttctgaaag tgagcccatt caacaacggt caactggtca tgggatcttt cgtcaagaat 
gagttttcgg ccccctccta ccttatgggc tataataaat ccttgagtgt ggtggcaacc 
acaacttttc tgactgggat tattcagcta ataatgggcg tattgggttt gggcttcatt 
gccacttacc ttccggagtc tgcaatgaat gcttacctgg ctgctgtggc acttcatatc 
atgctgtccc agctgacttt catctttggg attatgatta gtttccatgc cggtcccatc 
tccttcttct atgacataat taattactgt gtagctctcc caaaagcgaa ttccaccagc 
attctagtat ttctaactgt tgttgttgct ctgcgaatca acaaatgtat cagaatttct 
ttcaatcagt atcccattga gtttcccatg gaattatttc tgattattgg cttcactgtg 
attgcaaaca agataagcat ggccacagaa accagccaga cgcttattga catgattcct 
tatagctttc tgcttcctgt aacaccagat ttcagccttc ttcccaagat aattttacaa 
gccttctcct tatctttggt gagctccttt ctgctcatat ttctgggcaa gaagattgcc 
agtcttcaca attacagtgt caattccaac caggatttaa tagccatcgg cctttgcaat 
gtcgtcagtt catttttcag atcttgtgtg tttactggtg ctattgctag gactattatc 
caggataaat ctggaggaag acaacagttt gcatctctgg taggcgcagg tgtgatgctg 
ctcctgatgg tgaagatggg acactttttc tacacactgc caaatgctgt gctggctggt 
attattctga gcaacgtcat tccctacctt gaaaccattt ctaacctacc cagcctgtgg 1440 
aggcaggacc aatatgactg tgctctttgg atgatgacat tctcatcttc aattttcctg 
ggactggaca ttggactaat tatctcagta gtttctgctt tcttcatcac cactgttcgt 
tcacacagag ctaagattct tctcctgggt caaatcccta acaccaacat ttatagaagc 1620 
atcaatgatt atcgggagat catcaccatt cctggggtga aaatcttcca gtgctgcagc 1680 
tcaattacat ttgtaaatgt ttactaccta aagcataagc tgttaaaaga ggttgatatg 1740 
gtaaaggtgc ctcttaaaga agaagaaatt ttcagcttgt ttaattcaag tgacaccaat 1800 
ctacaaggag gaaagatttg caggtgtttc tgcaactgtg atgatctgga gccgctgccc 1860 
aggattcttt acacagagcg atttgaaaat aaactggatc ccgaagcatc ctccattaac 1920 
ctgattcact gctcacattt tgagagcatg aacacaagcc aaactgcatc cgaagaccaa 1980 
gtgccataca cagtatcgtc cgtgtctcag aaaaatcaag ggcaacagta tgaggaggtg 2040 
gaggaagttt ggcttcctaa taactcatca agaaacagct caccaggact gcctgatgtg 2100 
gcggaaagcc aggggaggag atcactcatc ccttactcag atgcgtctct actgcccagt 2160 
gtccacacca tcatcctgga tttctccatg gtacactacg tggattcacg ggggttagtc 2220 
gtattaagac agatatgcaa tgcctttcaa aacgccaaca ttttgatact cattgcaggg 2280 
tgtcactctt ccatagtcag ggcatttgag aggaatgatt tctttgacgc tggcatcacc 
aagacccagc tgttcctcag cgttcacgac gccgtgctgt ttgccttgtc aaggaaggtc 
ataggctcct ctgagttaag catcgatgaa tccgagacag tgatacggga aacctactca 
gaaacagaca agaatgacaa ttcaagatat aaaatgagca gcagttttct aggaagccaa 
aaaaatgtaa gtccaggctt catcaagatc caacagcctg tagaagagga gtcggagttg 
gatttggagc tggaatcaga acaagaggct gggctgggtc tggacctaga cctggatcgg 
gagctggagc ctgaaatgga gcccaaggct gagaccgaga ccaagaccca gaccgagatg 
gagccccagc ctgagactga gcctgagatg gagcccaacc ccaaatctag gccaagagct 2760 
cacacttttc ctcagcagcg ttactggcct atgtatcatc cgtctatggc ttccacccag 



1500 
1560 



2340 
2400 
2460 
2520 
2580 
2640 
2700 



2820 
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tctcagactc agactcggac atggtcagtg gag&ggagac gccatcctat ggattcatac 2880 
tcaccagagg gcaacagcaa tgaagatgtc tag . 2913 

<210> SEQ ID NO 7 

<211> LENGTH: 970 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<220> FEATURE: 

<221> NAME/KEY: VARIANT 

<222> LOCATION: (1)...(970) 

<223> OTHER INFORMATION: Xaa - Any Amino Acid 
<400> SEQUENCE: 7 

Met Ala Gin Leu Glu Arg Ser Ala lie Ser Gly Phe Ser Ser Lys Ser 
15 10 15 

Arg Arg Asn Ser Phe Ala Tyr Asp Val Lys Arg Glu Val Tyr Asn Glu 
20 25 30 

Glu Thr Phe Gin Gin Glu Hie Lys Arg Lys Ala Ser Ser Ser Gly Asn 
35 40 45 

Met Asn He Asn He Thr Thr Phe Arg His His Val Gin Cys Arg Cys 
50 55 60 

Ser Trp His Arg Phe Leu Arg Cys Met Leu Thr He Phe Pro Phe Leu 
65 70 75 80 

Glu Trp Met Cys Met Tyr Arg Leu Lys Asp Trp Leu Leu Gly Asp Leu 
85 90 95 

Leu Ala Gly He Ser Val Gly Leu Val Gin Val Pro Gin Gly Leu Thr 
100 105 110 

Leu Ser Leu Leu Ala Arg Gin Leu He Pro Pro Leu Asn He Ala Tyr 
115 120 125 

Ala Ala Phe Cys Ser Ser Val He Tyr Val He Phe Gly Ser Cys His 
130 135 140 

Gin Met Ser Val Gly Ser Phe Phe Leu Val Ser Ala Leu Leu He Asn 
145 150 155 160 

Val Leu Lys Val Ser Pro Phe Asn Asn Gly Gin Leu Val Met Gly Ser 
165 170 175 

Phe Val Lys Asn Glu Phe Ser Ala Pro Ser Tyr Leu Met Gly Tyr Asn 
180 185 190 

Lys Ser Leu Ser Val Val Ala Thr Thr Thr Phe Leu Thr Gly He He 
195 200 205 

Gin Leu He Met Gly Val Leu Gly Leu Gly Phe He Ala Thr Tyr Leu 
210 215 220 

Pro Glu Ser Ala Met Asn Ala Tyr Leu Ala Ala Val Ala Leu His He 
225 230 235 240 

Met Leu Ser Gin Leu Thr Phe He Phe Gly He Met He Ser Phe His . 

245 250 255 

Ala Gly Pro He Ser Phe Phe Tyr Asp lie lie Asn Tyr Cys Val Ala 
260 265 270 

Leu Pro Lys Ala Asn Ser Thr Ser He Leu Val Phe Leu Thr Val Val 
275 280 285 

Val Ala Leu Arg He Asn Lys Cys He Arg He Ser Phe Asn Gin Tyr 
290 295 300 

Pro He Glu Phe Pro Met Glu Leu Phe Leu He He Gly Phe Thr Val 
305 310 315 320 

He Ala Asn Lys He Ser Met Ala Thr Glu Thr Ser Gin Thr Leu He 
325 330 335 
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Asp Met He Pro Tyr Ser Phe Leu Leu Pro Val Thr Pro Asp Phe Ser 
340 345 350 

Leu Leu Pro Lys He He Leu Gin Ala Phe Ser Leu Ser Leu Val Ser 

355 360 365 

Ser Phe Leu Leu He Phe Leu Gly Lys Lye He Ala Ser Leu His Asn 
370 375 380 

Tyr Ser Val Asn Ser Asn Gin Asp Leu He Ala He Gly Leu Cys Asn 
385 390 395 400 

Val Val Ser Ser Phe Phe Arg Ser Cys Val Phe Thr Gly Ala He Ala 
405 410 415 

Arg Thr He He Gin Asp Lys Ser Gly Gly Arg Gin Gin Phe Ala Ser 
420 425 430 

Leu Val Gly Ala Gly Val Met Leu Leu Leu Met Val Lys Met Gly His 
435 440 445 

Phe Phe Tyr Thr Leu Pro Asn Ala Val Leu Ala Gly He He Leu Ser 
450 455 460 

Asn Val He Pro Tyr Leu Glu Thr He Ser Aen Leu Pro Ser Leu Trp 
465 470 475 480 

Arg Gin Asp Gin Tyr Asp Cys Ala Leu Trp Met Met Thr Phe Ser Ser 
485 490 495 

Ser He Phe Leu Gly Leu Asp He Gly Leu He He Ser Val Val Ser 
500 505 510 

Ala Phe Phe He Thr Thr Val Arg Ser His Arg Ala Lys He Leu Leu 
515 520 525 

Leu Gly Gin lie Pro Asn Thr Asn lie Tyr Arg Ser He Asn Asp. Tyr 
530 535 540 

Arg Glu He He Thr lie Pro Gly Val Lys He Phe Gin Cys Cys Ser 
545 v 550 555 560 

Ser He Thr Phe Val Asn Val Tyr Tyr Leu Lys His Lys Leu Leu Lys 
565 570 575 

Glu Val Asp Met Val Lys Val Pro Leu Lys Glu Glu Glu He Phe Ser 
580 585 590 

Leu Phe Asn Ser Ser Asp Thr Asn Leu Gin Gly Gly Lys He Cys Arg 
595 600 605 

Cys Phe Cys Asn Cys Asp Asp Leu Glu Pro Leu Pro Arg He Leu Tyr 
610 615 620 

Thr Glu Arg Phe Glu Asn Lys Leu Asp Pro Glu Ala Ser Ser He Asn 
625 630 635 640 

Leu He His Cys Ser His Phe Glu Ser Met Asn Thr Ser Gin Thr Ala 
645 650 655 

Ser Glu Asp Gin Val Pro Tyr Thr Val Ser Ser Val Ser Gin Lys Asn 
660 665 670 

Gin Gly Gin Gin Tyr Glu Glu Val Glu Glu Val Trp Leu Pro Asn Asn 
675 x 680 685 

Ser Ser Arg Asn Ser Ser Pro Gly Leu Pro Asp Val Ala Glu Ser Gin 
690 695 700 

Gly Arg Arg Ser Leu He Pro Tyr Ser Asp Ala Ser Leu Leu Pro Ser 
705 710 715 720 

Val His Thr He He Leu Asp Phe Ser Met Val His Tyr Val Asp Ser 
725 730 735 

Arg Gly Leu Val Val Leu Arg Gin He Cys Asn Ala Phe Gin Asn Ala 
740 745 750 



Asn He Leu He Leu He Ala Gly Cys His Ser Ser 
755 760 



He Val Arg Ala 
765 
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Phe Glu Arg Asn Asp Phe Phe Asp Ala Gly lis Thr Lys Thr Gin Leu 
770 775 780 , 

Phe Leu Ser Val His Asp Ala Val Leu Phe Ala Leu Ser Arg Lys Val 
785 790 795 800 

He Gly Ser Ser Glu Leu Ser He Asp Glu Ser Glu Thr Val He Arg 
805 810 815 

Glu Thr Tyr Ser Glu Thr Asp Lys Asn Asp Asn Ser Arg Tyr Lys Met 
820 825 830 

Ser Ser Ser Phe Leu Gly Ser Gin Lys Asn Val Ser Pro Gly Phe He 
835 840 845 

Lys He Gin Gin Pro Val Glu Glu Glu Ser Glu Leu Asp Leu Glu Leu 
850 855 860 

Glu Ser Glu Gin Glu Ala Gly Leu Gly Leu Asp Leu Asp Leu Asp Arg 
865 870 875 880 

Glu Leu Glu Pro Glu Met Glu Pro Lys Ala Glu Thr Glu Thr Lys Thr 
885 890 895 

Gin Thr Glu Met Glu Pro Gin Pro Glu Thr Glu Pro Glu Met Glu Pro 
900 905 910 

Asn Pro Lys Ser Arg Pro Arg Ala His Thr Phe Pro Gin Gin Arg Tyr 
915 920 925 

Trp Pro Met Tyr His Pro Ser Met Ala Ser Thr Gin Ser Gin Thr Gin 
930 935 940 

Thr Arg Thr Trp Ser Val Glu Arg Arg Arg His Pro Met Asp Ser Tyr 
945 950 955 ,960 

Ser Pro Glu Gly Asn Ser Asn Glu Asp Val 
965 970 



<210> SEQ ID NO 8 

<2U> LENGTH: 3749 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 8 

ttttccaact ccccatctcc tccctcctca gattaaaaga agttatatgg actttgtgat 
gttttctgcc gctttgtgaa gtaggcctta tttctcttgt cctttcgtac agggaggaat 120 
ttgaagtaga tagaaaccga cctggattac tccggtctga actcagatca cgtaggactt 180 
taatcgttga acaaacgaac ctttaatagc ggctgcacca tcgggatgtc ctgatccaac 240 
atcgaggtcg taaaccctat tgttgatatg gactctagaa taggattgcg ctgttatccc 
tagggtaact tgttccgttg gtcaagttat tggatcaatt gagtatagta gttcgctttg 
actggtgaag tcttggcatg tactgctcgg aggttgggtt ctgctccgag gtcgccccaa 420 
ccgaaatttt taatgcagga gcgcccgcac tcccgccccc gccaaggagc caggaatggc 



60 



300 
360 



480 



acaactagag aggagcgcca tctctggctt cagctctaag tccaggcgaa actcattcgc 540 
atatgatgtt aagcgtgaag tatacaatga ggagaccttt caacaggaac acaaaaggaa 



600 



ggcctcctct tctgggaaca tgaacatcaa catcaccacc ttcagacacc acgtccagtg 660 
ccgctgctca tggcacaggt tcctacgatg crtgcttaca atctttccct tcctagaatg 



720 



gatgtgtatg tatcgattaa aggattggct tctgggagac ttacttgctg gtataagtgt 780 

tggccttgtg caagttcccc aaggcctgac acttagtttg ctggcaaggc aactgattcc 

tcctctcaac atcgcttatg cagctttctg ttcttcggta atctatgtaa tttttggatc 

gtgtcatcaa atgtccgttg gttccttctt cctggtgagt gctctgctga tcaacgttct 960 



840 
900 



43 
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gaaagtgagc 


ccattcaaca 


acggtcaact 


ggtcatggga 


tctttcgtca 


agaatgagtt 


1020 


ttcggccccc 


tcctacctta 


tgggctataa 


taaatccttg 


agtgtggtgg 


caaccacaac 


1080 


ttttctgact 


gggattattc 


agctaataat 


gggcgtattg 


ggtttgggct 


tcattgccac 


1140 


ttaccttccg 


gagtctgcaa 


tgaatgctta 


cctggctgct 


gtggcacttc 


atatcatgct 


1200 


gtcccagctg 


actttcatct 


ttgggattat 


gattagtttc 


catgccggtc 


ccatctcctt 


1260 


cttctatgac 


ataattaatt 


actgtgtagc 


tctcccaaaa 


gcgaattcca 


ccagcattct 


1320 


agtatttcta 


actgttgttg 


ttgctctgcg 


aatcaacaaa 


tgtatcagea 


tttctttcaa 


1380 


tcagtatccc 


attgagtttc 


ccatggaatt 


atttctgatt 


attggcttca 


ctgtgattgc 


1440 


aaacoagata 


agcatggcca 


cagaaaccag 


ccagacgctt 


attgacatga 


ttccttatag 


1500 


ctttctgctt 


cctgtaacac 


cagatttcag 


ccttcttccc 


aagataattt 


tacaagcctt 


1560 


ctccttatct 


ttggtgagct 


cctttctgct 


catatttctg 


ggcaagaaga 


ttgccagtct 


1620 


tcacaattac 


agtgtcaatt 


ccaaccagga 


tttaatagcc 


atcggccttt 


gcaatgtcgt 


1680 


cagttcattt 


ttcagatctt 


gtgtgtttac 


tggtgctatt 


gctaggacta 


ttatccagga 


1740 


taaatctgga 


ggaagacaac 


agtttgcatc 


tctggtaggc 


gcaggtgtga 


tgctgctcc-t 


1800 


gatggtgaag 


atgggacact 


ttttctacac 


actgccaaat 


gctgtgctgg 


ctggtattat 


1860 


tctgagcaac 


gtcattccct 


accttgaaac 


catttctaac 


ctacccagcc 


tgtggaggca 


1920 


ggaccaatat 


gactgtgctc 


tttggatgat 


gacattctca 


tcttcaattt 


tcctgggact 


1980 


ggacattgga 


ctaattatct 


cagtagtttc 


tgctttcttc 


atcaccactg 


ttcgttcaca 


2040 


cagagctaag 


attcttctcc 


tgggtcaaat 


ccctaacacc 


aacatttata 


gaagcatcaa 


2100 


tgattatcgg 


gagatcatca 


ccattcctgg 


ggtgaaaatc 


ttccagtgct 


gcagctcaat 


2160 


tacatttgta 


aatgtttact 


acctaaagca 


taagctgtta 


aaagaggttg 


atatggtaaa 


2220 


ggtgcctctt 


aaagaagaag 


aaattttcag 


cttgtttaat 


tcaagtgaca 


ccaatctaca 


2280 


aggaggaaag 


atttgcaggt 


gtttctgcaa 


ctgtgatgat 


ctggagccgc 


tgcccaggat 


2340 


tctttacaca 


gagcgatttg 


aaaataaact 


ggatcccgaa 


gcatcctcca 


ttaacctgat 


2400 


tcactgctca 


cattttgaga 


gcatgaacac 


aagccaaact 


gcatccgaag 


accaagtgcc 


2460 


atacacagta 


tcgtccgtgt 


ctcagaaaaa 


tcaagggcaa 


cagtatgagg 


aggtggagga 


2520 


agtttggctt 


cctaataact 


catcaagaaa 


cagctcacca 


ggactgcctg 


atg-tggcgga 


2580 


aagccagggg 


aggagatcac 


tcatccctta 


ctcagatgcg 


tctctactgc 


ccagtgtcca 


2640 


caccatcatc 


ctggatttct 


ccatggtaca 


ctacgtggat 


tcacgggggt 


tagtcgtatt 


2700 


aagacagata 


tgcaatgcct 


ttcaaaacgc 


caacattttg 


atactcattg 


cagggtgtca 


2760 


ctcttccata 


gtcagggcat 


ttgagaggaa 


tgatttcttt 


gacgctggca 


tcaccaagac 


2820 


ccagctgttc 


ctcagcgttc 


acgacgccgt 


gctgtttgcc 


ttgtcaagga 


aggtcatagg 


2880 


ctcctctgag 


ttaagcatcg 


atgaatccga 


gacagtgata 


cgggaaacct 


actcagaaac 


2940 


agacaagaat 


gacaattcaa 


gatataaaat 


gagcagcagt 


tttctaggaa 


gccaaaaaaa 


3000 


tgtaagtcca 


ggcttcatca 


agatccaaca 


gcctgtagaa 


gaggagtcgg 


agttggattt 


3060 


ggagctggaa 


tcagaacaag 


aggctgggct 


gggtctggac 


ctagacctgg 


atcgggagct 


3120 


ggagcctgaa 


atggagccca 


aggctgagac 


cgagaccaag 


acccagaccg 


agatggagcc 


3180 


ccagcctgag 


actgagcctg 


agatggagcc 


caaccccaaa 


tctaggccaa 


gagctcacac 


3240 


ttttcctcag 


cagcgttact 


ggcctatgta 


tcatccgtct 


atggcttcca 


cccagtctca 


3300 



45 
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gactcagact 
agagggcaac 
ctggcaaatc 
ctagcagtac 
gcgctgggat 
tatccaggct 
cacaccccac 
ttcccctttc 



cggacatggt 
agcaatgaag 
ctcctaccca 
ttccttcctg 
catactccca 
cccctcattt 
atctctgggc 
tccaaagaga 



cagtggagag 
atgtctagga 
aaaaggggtc 
actgtgactc 
aatcacatta 
caccttcagc 
tttgtgccag 
tgaagctca 



gagacgccat 
gatgaactag 
aattgtccag 
ctactacctg 
ctaaatgcca 
atatattcta 
accatctcta 



cctatggatt catactcacc 
aaataagggg tcagataatg 
agacctagac tggatacgaa 
ccagccttct tccttgctct 
acaattatct ctgaattccc 
gtcatgaatt tccttcttca 
acttaatcct ctcatccctg 



<210> SEQ ID NO 9 

<211> LENGTH: 1524 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE : 9 

atggggtgtt ggggtcggaa ccggggccgg 

ttcatggtgc tggaggtggt ggtgagccgg 

tccttccaca tgctgtcgga cgtgctggcg 

gcccggcgga cccacgccac ccagaagaac 

ggggctctgg tgaacgccat cttcctgact 

atcgagcgct tcatcgagcc gcacgagatg 

gtggccgggc tgctggtcaa cgtgctgggg 

agccaggact ccggccacgg ccactcgcac 

aaggggcctc gcgttaagag cacccgcccc 

gagcagggtc ccgaccagga ggagaccaac 

gggctgaaat tggaccccgc agacccagaa 

caagtgaatg gaaatcttgt cagagaacct 

ggacaactta acatgcgtgg agtttttctg 

attgtagtag taaatgcctt agtcttttac 

ttttgtgtga atccatgttt ccctgacccc 

actcatgcat cactttatga ggctggtcct 

tgtgttgtaa tggtttgtat acttctttac 

cttattcttc tacaaactgt tcctaaacaa 

cgaaatgttg aaggagttga ggaagttcat 

agaatcattg ccactgctca cataaaatgt 

aaaaccatta aagacgtttt tcataatcac 

tttgctagtg taggctctaa atcaagtgta 

tgtgctttga agcaatgttg tgggacacta 

aagaccccag cagttagcat ttcttgttta 

aggaggacta aagctgaaaa catccctgct 

aaacaacctg aatcatcttt gtga 



ctgc-tgtgca 
gtgacctcgt 
ctggtggtgg 
acgttcggct 
ggcctctgtt 
cagcagccgc 
ctctgcctct 
gggggtcacg 
gggagcagcg 
accctggtgg 
aaccccagaa 
gaccatatgg 
catgtccttg 
ttttcttgga 
tgcaaagcat 
tgctgggtgc 
acaacctatc 
attgatatca 
gaattacatg 
gaagatccaa 
ggaattcacg 
gttccgtgtg 
ccacaagccc 
gaacttagta 
gttgtgatag 



tgctggcgct 
cgctggcgat 
cgctggtggc 
ggatccgagc 
tcgccatcct 
tggtggtcct 
tccaccatca 
gccacggcca 
acatcaacgt 
ccaataccag 
gtggtgatac 
aactggaaga 
gagatgcctt 
aaggttgttc 
ttgtagaaat 
tatatttaga 
cattacttaa 
gaaatttgat 
tttggcaact 
catcatacat 
ctactaccat 
aacttgcctg 
cttatggaaa 
acaatctaga 
agattaaaaa 



gaccttcatg 
gctctccgac 
cgagcgcttc 
cgaggtaatg 
gctggaggcc 
tggggtcggc 
cagcggcttc 
cggcctcccc 
ggccccgggc 
caactccaac 
agtggaagta 
agatagggct 
gggttcagtg 
tgaaggggat 
aattaatagt 
tccaactctt 
ggaatctgct 
aaaagaactt 
tgctggaagc 
ggaggtggct 
tcagcctgaa 
cagaacccag 
ggatgcagaa 
gaagaagccc 
catgccaaac 



3360 
3420 
3480 
3540 
3600 
3660 
3720 
3749 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
.1140 
• 1200 
1260 
1320 
1380 
1440 
1500 
1524 



<210> SEQ ID NO 10 
<211> LENGTH i 507 
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<212> TYPE: PRT 
<213> ORGANISM: 



homo sapiens 



<400> SEQUENCE: 



10 



Met Gly Cys Trp Gly Arg Asn Arg Gly Arg Leu Leu Cys Met Leu Ala 
15 10 15 

Leu Thr Phe Met Phe Met Val Leu Glu Val Val Val Ser Arg Val Thr 
20 25 30 

Ser Ser Leu Ala Met Leu Ser Asp Ser Phe His Met Leu Ser Asp Val 
35 40 45 

Leu Ala Leu Val Val Ala Leu Val Ala Glu Arg Phe Ala Arg Arg Thr 
50 55 60 

His Ala Thr Gin Lys Asn Thr Phe Gly Trp He Arg Ala Glu Val Met 
65 70 75 80 

Gly Ala Leu Val Asn Ala He Phe Leu Thr Gly Leu Cys Phe Ala He 
85 90 95 

Leu Leu Glu Ala He Glu Arg Phe He Glu Pro His Glu Met Gin Gin 
100 105 HO 

Pro Leu Val Val Leu Gly Val Gly Val Ala Gly Leu Leu Vol Asn Val 
115 120 125 

Leu Gly Leu Cys Leu Phe His His His Ser Gly Phe Ser Gin Asp Ser 
130 135 140 

Gly His Gly His Ser His Gly Gly HiB Gly His Gly His Gly Leu Pro 
145 150 155 160 

Lys Gly Pro Arg Val Lys Ser Thr Arg Pro Gly Ser Ser Asp He Asn 
165 170 175 

Val Ala Pro Gly Glu Gin Gly Pro Asp Gin Glu Glu Thr Asn Thr Leu 



Val Ala Asn Thr Ser Asn Ser Asn Gly Leu Lys Leu Asp Pro Ala Asp 
195 200 205 

Pro Glu Asn Pro Arg Ser Gly Asp Thr Val Glu Val Gin Val Asn Gly 
210 215 220 

Asn Leu Val Arg Glu Pro Asp His Met Glu Leu Glu Glu Asp Arg Ala 
225 230 235 240 

Gly Gin Leu Asn Met Arg Gly Val Phe Leu His Val Leu Gly Asp Ala 
245 250 255 

Leu Gly Ser Val He Val Val Val Asn Ala Leu Val Phe Tyr Phe Ser 
260 265 270 

Trp Lys Gly Cys Ser Glu Gly Asp Phe Cys Val Asn Pro Cys Phe Pro 
275 280 285 

Asp Pro Cys Lys Ala Phe Val Glu He He Asn Ser Thr His Ala Ser 
290 295 300 

Leu Tyr Glu Ala Gly Pro Cys Trp Val Leu Tyr Leu Asp Pro Thr Leu 
305 310 315 320 

Cys Val Val Met Val Cys He Leu Leu Tyr Thr Thr Tyr Pro Leu Leu 
325 330 335 

Lys Glu Ser Ala Leu He Leu Leu Gin Thr Val Pro Lys Gin He Asp 
340 345 350 

He Arg Asn Leu He Lys Glu Leu Arg Asn Val Glu Gly Val Glu Glu 
355 360 365 

Val His Glu Leu His Val Trp Gin Leu Ala Gly Ser Arg He He Ala 
370 375 380 

Thr Ala His He Lys Cys Glu Asp Pro Thr Ser Tyr Met Glu Val Ala 
385 390 395 400 



180 



185 



190 
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Lys Thr lie Lys Asp Val Phe His Asn HiB Gly lie His Ala Thr Thr 
405 410 415 

lie Gin Pro Glu Phe Ala Ser Val Gly Ser Lys Ser Ser Val Val Pro 
420 . 425 430 

Cys Glu Leu Ala Cys Arg Thr Gin Cys Ala Leu Lye Gin Cys Cys Gly 
435 440 445 

Thr Leu Pro Gin Ala Pro Tyr Gly Lys Asp Ala Glu Lys Thr Pro Ala 
450 455 460 

Val Ser lie Ser Cys Leu Glu Leu Ser Asn Asn Leu Glu Lys Lys Pro 
465 470 475 480 

Arg Arg Thr Lys Ala Glu Asn He Pro Ala Val Val lie Glu He Lys 
485 490 495 

Asn Met Pro Asn Lys Gin Pro Glu Ser Ser Leu 
500 505 



<210> SEQ ID NO 11 

<2U> LENGTH: 2222 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 11 

ctccggctgc ggctcttggt accccggctc cgggagccca gctccccgcc accgccgccg 60 

cctgggtgtg ggggctgctg aggctgagcc gggcttcggc gccggctctg aggacggacg 120 

cctgaggagc tgcgcggcgc ggcgccgccg gctggcggag aacgcccaca ggcgcggggc 180 

tcggcggctt gacccgggct tgtccccgtg cggccgcggg ggcccctcag cggtttcccg 240 

aacggcccga ctcgggcgct cctccgtgtc gcggtcgccg accctccgcg tcccgccaac 300 

gccgccgctg caccagtctc cgggccgggc tcggcgggcc ccgcagccgc agccatgggg 360 

tgttggggtc ggaaccgggg ccggctgctg tgcatgctgg cgctgacctt cntgttcatg 420 

gtgctggagg tggtggtgag ccgggtgacc tcgtcgctgg cgatgctctc cgactccttc 480 

cacatgctgt cggacgtgct ggcgctggtg gtggcgctgg tggccgagcg cttcgcccgg 540 

cggacccacg ccacccagaa gaacacgttc ggctggatcc gagccgaggt aatgggggct 600 

ctggtgaacg ccatcttcct gactggcctc tgtttcgcca tcctgctgga ggccatcgag 660 

cgcttcatcg agccgcacga gatgcagcag ccgctggtgg tccttggggt cggcgtggcc 720 

gggctgctgg tcaacgtgct ggggctctgc ctcttccacc atcacagcgg cttcagccag 780 

gactccggcc acggccactc gcacgggggt cacggccacg gccacggcct ccccaagggg 840 

cctcgcgtta agagcacccg ccccgggagc agcgacatca acgtggcccc gggcgagcag 900 

ggtcccgacc aggaggagac caacaccctg gtggccaata ccagcaactc caacgggctg 960 

aaattggacc ccgcagaccc agaaaacccc agaagtggtg atacagtgga agtacaagtg M020 

aatggaaatc ttgtcagaga acctgaccat atggaactgg aagaagatag ggctggacaa 1080 

cttaacatgc gtggagtttt tctgcatgtc cttggagatg ccttgggttc agtgattgta 1140 

gtagtaaatg ccttagtctt ttacttttct tggaaaggtt gttctgaagg ggatttttgt 1200 

gtgaatccat gtttccctga cccctgcaaa gcatttgtag aaataattaa tagtactcat 1260 

gcatcacttt atgaggctgg tccttgctgg gtgctatatt tagatccaac tctttgtgtt 1320 

gtaatggttt gtatacttct ttacacaacc tatccattac ttaaggaatc tgctcttatt 1380 

cttctacaaa ctgttcctaa acaaattgat atcagaaatt tgataaaaga acttcgaaat 1440 

gttgaaggag ttgaggaagt tcatgaatta catgtttggc aacttgctgg aagcagaatc 1500 
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attgccactg 


ctcacataaa 


atgtgaagat 


ccaacatcat 


acatggaggt 


ggctaaaacc 


1560 


attaaagacg 


tttttcataa 


tcacggaatt 


cacgctacta 


ccattcagcc 


tgaatttgct 


1620 


agtgtaggct 


ctaaatcaag 


tgtagttccg 


tgtgaacttg 


cctgcagaac 


ccagtgtgct 


1680 


ttgaagcaat 


gttgtgggac 


actaccacaa 


gccccttatg 


gaaaggatgc 


agaaaagacc 


1740 


ccagcagtta 


gcatttcttg 


tttagaactt 


agtaacaatc 


tagagaagaa 


gcccaggagg 


1800 


actaaagctg 


aaaacatccc 


tgctgttgtg 


atagagatta 


aaaacatgcc 


aaacaaacaa 


1B60 


cctgaatcat 


ctttgtgagt 


cttgaaaaag 


atgtgatatt 


tgacttttgc 


tttaaactgc 


1920 


aagaggaaaa 


agactccact 


gaaattctaa 


gtttgccaag 


tagtgtaatt 


gaagtccttg 


1980 


tctggtcaca 


cagtttaatt 


ctatttttgt 


aagaacataa 


tgggactgca 


taacagagtt 


2040 


ctatattaca 


atttgtgatt 


attagtacag 


agtacagcta 


tgctgtgact 


gttttggaaa 


2100 


gccagtttta 


acactatgtt 


acatttttgt 


ttaaagtaag 


ttaaacctta 


tataacataa 


2160 


tgacatttga 


tttctggatt 


tttcccatgg 


ataaaaaatt 


aggggggata 


aaattaaaat 


2220 


tg 












2222 



What is claimed is: 

1. An isolated nucleic acid molecule comprising a 
sequence that: 

(a) encodes the amino acid sequence shown in SEQ ID 
NO: 7; and 

(b) hybridizes under highly stringent conditions with 
wash conditions of 0.1xSSC/0.1%SDS at 68° C. to the 
nucleotide sequence of SEQ ID NO: 6 or the comple- 
ment thereof. 



2. An isolated nucleic acid molecule comprising a nucle- 
otide sequence that encodes the amino acid sequence shown 
in SEQ ID NO:7. 
30 3. A recombinant expression vector comprising the iso- 
lated nucleic acid molecule of claim 2. 

4. A host cell comprising the recombinant expression 
vector of claim 3. 

* * * * * 




This Page Blank (uspto) 



ifll 



(12) United States Patent 

T\irner, Jr. et al. 



US006511840B1 

(10) Patent No.: US 6,511,840 Bl 
(45) Date of Patent: Jan. 28, 2003 



(54) HUMAN KINASE PROTEINS AND 

POLYNUCLEOTIDES ENCODING THE 
SAME . 

(75) Inventors: C. Alexander Turner, Jr., The 

Woodlands, TX (US); Brian Mathur, 
The Woodlands, TX (US); Daniel 
Mathur, Wooster, OH (US); Carl 
Johan Friddle, The Woodlands, TX 
(US) 

(73) Assignee: Lexicon Genetics Incorporated, The 
Woodlands, TX (US) 

( * ) Notice: Subject to any disclaimer, the term of this 
patent is extended or adjusted under 35 
U.S.C. 154(b) by 0 days. 

(21) Appl. No.: 09/883,134 

(22) Filed: Jun. 15, 2001 

Related U.S. Application Data 

(60) Provisional application No. 60/211,572, filed on Jun. 15, 
2000, and provisional application No. 60/216,382, filed on 
Jul. 7, 2000. 

(51) Int. CI. 7 C12N 1/20; C12N 15/00; 

C12N 9/12; C07H 21/04; C07H 21/02 

(52) U.S. CI 435/252.3; 435/320.1; 

435/6; 435/194; 536/23.1; 536/23.2 

(58) Field of Search .„ 536/23.2, 23.1; 

435/6, 320.1, 252.3, 194 

(56) References Cited 

U.S. PATENT DOCUMENTS 

4,215,051 A 7/1980 Schroeder et al. 
4,376,110 A 3/1983 David et al. 
4,594,595 A 6/1986 Struckman 

4,631,211 A 12/1986 Houghten 
4,689,405 A 8/1987 Frank et al. 

4,713,326 A 12/1987 Dattagupta et al. 

4,873,191 A 10/1989 Wagner et al. 
4,946,778 A 8/1990 Ladner et al. 

5,252,743 A 10/1993 Barrett et al. 
5,424,186 A 6/1995 Fodor et al. 
5,445,934 A 8/1995 Fodor et al. 

5,459,127 A 10/1995 Feigner et al. 

. 5,556,752 A 9/1996 Lockhart et al. 

5,700,637 A 12/1997 Southern 

5,744,305 A 4/1998 Fodor et al. 

5,830,721 A 11/1998 Stemmer et al. 

5,837,458 A 11/1998 Minshull et al. 

5,869,336 A 2/1999 Meyer et al. 

5,877,397 A 3/1999 Lonberg et al. 

5,948,767 A 9/1999 Scheule et al. 

6,001,593 A 12/1999 Bandman et al. 

6,075,181 A 6/2000 Kucherlapati et al. 

6,110,490 A 8/2000 Thierry 

6,150,584 A 11/2000 Kucherlapati et al. 

OTHER PUBLICATIONS 

EST Database, Avccession No. BE736116, Sep. 2000.* 
Hillier et al., EST Database, Accession No. AA088547, Jul. 
1997.* 



Bird et al, 1988, "Single-Chain Antigen-Binding Proteins", 
Science 242:423-^26. 

Bitter et al, 1987, "Expression and Secretion Vectors for 
Yeast", Methods in Enzymology 153:516-544. 
Colbere-Garapin et al, 1981, "A New Dominant Hybrid 
Selective Marker for Higher Eukaryotic Cells", J. Mol. Biol. 
150:1-14. 

Gautier et al, 1987, "a-DNA IV:a-anomeric and p-ano- 
meric tetrathymidylates convalently linked to intercalating 
oxazolopyridocarbazole. Synthesis, physiochemical proper- 
ties and poly (rA) binding", Nucleic Acids Research 
15(16):6625-6641. 

Gordon, 1989, "Transgenic Animals", International Review 
of Cytology, 115:171-229. 

Greenspan et al, 1993, "Idiotypes: structure and immuno- 
genicity", FASEB Journal 7:437-444. 
Gu et al, 1994, "Deletion of a DNA Polymerase p Gene 
Segment in T Cells Using Cell Type-Specific Gene Target- 
ing", Science 265:103-106. 

Huse et al, 1989, "Generation of a Large Combinatorial 
Library of the Immunoglobulin Repertoire in Phage 
Lambda", Science 246:1275-1281. 
Huston et al, 1988, "Protein engineering of antibody binding 
sites: Recovery of specific activity in an anti-digoxin sin- 
gle-chain Fv analogue produced in Escherichia coli", Proc. 
Natl Acad. Sci. USA 85:5879-5883. 
Inoue et al, 1987, "Sequence-dependent hydrolysis of RNA 
using modified oligonucleotides splints and R Nase H", 
FEBS Letters 215(2):327-330. 

Inoue et al, 1987, "Synthesis and hybridization studies on 

two complementary nona(2 , -0-methyl)ribonucleotides", 

Nucleic Acids Research 15(15):6131-6149. 

Inouye & Inouye, 1985, "Up-promoter mutations in the lpp 

gene of Escherichia coW\ Nucleic Acids Research 

13(9):3101-3110. 

Janknecht et al, 1991, "Rapid and efficient purification of 

native histidine-tagged protein expressed by recombinant 

vaccinia vims", PNAS 88:8972-8976. 

Kohler & Milstein, 1975, "Continuous cultures of fused 

cells secreting antibody of predefined specificity", Nature 

256:495^97. 

Lakso et al, 1992, "Targeted oncogene activation by 
site-specific recombination in transgenic mice", Proc. Natl. 
Acad. Sci. USA 89:6232-6236. 

Lavitrano et al, 1989, "Sperm Cells ad Vectors for Intro- 
ducing Froeign DNA into Eggs: Genetic Transformation of 
Mice", Cell 57:717-723. 

Lo, 1983, "Transformation by Iontophoretic Microinjection 
of DNA: Multiple Integrations without Tandem Insertions", 
Mol. & Cell. Biology 3(10): 1803-1814. 

(List continued on next page.) 

Primary Examiner — Rebecca E. Prouty 
Assistant Examiner — Mary am Monshipouri 



(57) 



ABSTRACT 



Novel human polynucleotide and polypeptide sequences are 
disclosed that can be used in therapeutic, diagnostic, and 
pharmacogenomic applications. 

5 Claims, No Drawings 



App Serial # 09/854,844 Exhibit Q 

Huetal. LEX-0176-USA 

Novel Human Protease and Polynucleotides Encoding the Same 

J 



US 6,511,840 Bl 

Page 2 



OTHER PUBLICATIONS 

Logan et al, 1984, "Adenovirus tripartite leader sequence 
enhances translation of mRNAs late after infection", Proc. 
Natl. Acad. Sci. USA 81:3655-3659. 
Lowy et al, 1980, "Isolation of Transforming DNA Cloning 
the Hamster aprt Gene", Cell 22:817-823. 
Morrison et al, 1984, "Chimeric human antibody molecules: 
Mouse antigen-binding domains with human constant 
region domains", Proc. Natl. Acad. Sci. USA 81: 685 1-6855. 
Mulligan & Berg, 1981, "Selection for animal cells that 
express the Escherichia coli gene coding for xanthine-gua- 
nine phosphoribosyltransferase", Proc. Natl. Acad. Sci. USA 
78(4):2072-2076. 

Neuberger et al, 1984, "Recombinant antibodies possessing 
novel effector functions," Nature 312:604-608. 
Nisonoff, 1991, "Idiotypes: Concepts and Applications", J. 
of Immunology 147:2429-2438. 

O'Hare et al, 1981, "Transformation of mouse fibroblasts to 
methotrexate resistance by a recombinant plasmid express- 
ing a prokaryotic dihydrofolate reductase", Proc. Natl. Acad. 
Sci. USA78(3):1527-1531. 

Ruther et al, 1983, "Easy identification of cDNA clones", 
EMBO Journal 2(10): 1791-1794. 

Santerre et al, 1984, "Expression of prokaryotic genes for 
hygromycin B and G418 resistance as dominant-selection 
markers in mouse L cells", Gene 30:147-156. 
Sarin et al, 1988, "Inhibition of acquired immunodeficiency 
syndrome virus by oligodeoxynucleoside methylphospho- 
nates", Proc. Natl Acad. Sci. USA 85:7448-7451. 
Smith et al, 1983, "Molecular Engineering of the 
Autographa californica Nuclear Polyhedrosis Virus 
Genome: Deletion Mutations within the Polyhedrin Gene", 
J. Virol. 46(2):584-593. 



Stein et al, 1988, "Physiochemical properties of phospho- 
rothioate oligodeoxynucleotides", Nucleic Acids Research 
16(8):3209-3221. 

Szybalska & Szybalski, 1962, "Genetics of Human Cell 
Lines, IV. DNA-Mediated Heritable Transformation of a 
Biochemical Trait", Proc. Natl. Acad. Sci. USA 
48:2026-2034 

Takeda et al, 1985, "Construction of chim aerie processed 
immunoglobulin genes containing mouse variable and 
human constant region sequences", Nature 314:452-454. 

Thompson et al, 1989, "Germ Line Transmission and 
Expression of a Corrected HPRT Gene Produced by Gene 
Targeting in Embryonic Stem Cells", Cell 56:313-321. 

Van Der Putten et al, 1985, "Efficient insertion of genes into 
the mouse germ line via retroviral vectors", Proc. Natl. 
Acad. Sci. USA 82:6148-6152. 

Van Heeke et al, 1989, "Expression of Human Asparagine 
Synthetase in Escherichia coli", J. Biol. Chemistry 
264(10):5503-5509. 

Ward et al, 1989, "Binding activities of a repertoire of single 
immunoglobulin variable domains secreted from Escheri- 
chia co/i", Nature 341:544-546. 

Wigler et al, 1977, "Transfer of Purified Herpes Virus 
Thymidine Kinase Gene to Cultured Mouse Cells", Cell 
11:223-232. 

Wigler et al, 1980, "Transformation of mammalian cells 
with an amplifiable dominant-acting gene", Proc; Natl. 
Acad! Sci. USA 77(6):3567-3570. 

* cited by examiner 



US 6,5 

1 

HUMAN KINASE PROTEINS AND 
POLYNUCLEOTIDES ENCODING THE 
SAME 

The present application claims the benefit of U.S. Pro- 
visional Application Nos, 60/211,572 and 60/216,382 which 
were filed on Jun. 15, 2000 and Jul. 7, 2000, respectively. 
These U.S. Provisional Applications are herein incorporated 
by reference in their entirety. 

INTRODUCTION 

The present invention relates to the discovery, 
identification, and characterization of novel human poly- 
nucleotides encoding a protein that shares sequence simi- 
larity with animal kinases. The invention encompasses the 
described polynucleotides, host cell expression systems, the 
encoded proteins, fusion proteins, polypeptides and 
peptides, antibodies to the encoded proteins and peptides, 
and genetically engineered animals that either lack or over 
express the disclosed polynucleotides, antagonists and ago- 
nists of the proteins, and other compounds that modulate the 
expression or activity of the proteins encoded by the dis- 
closed polynucleotides that can be used for diagnosis, drug 
screening, clinical trial monitoring, the treatment of physi- 
ological disorders or diseases, and cosmetic or nutriceutical 
applications. 

BACKGROUND OF THE INVENTION 

Kinases mediate phosphorylation of a wide variety of 
proteins and compounds in the cell. Along with 
phosphatases, kinases are involved in a range of regulatory 
pathways. Given the physiological importance of kinases, 
they have been subject to intense scrutiny and are proven 
drug targets. 

SUMMARY OF THE INVENTION 

The present invention relates to the discovery, 
identification, and characterization of nucleotides that 
encode novel human proteins, and the corresponding amino 
acid sequences of these proteins. The novel human proteins 
(NHPS) described for the first time herein share structural 
similarity with animal kinases, including, but not limited to 
myosin kinases and unconventional myosin classes of pro- 
teins (SEQ ID NOS: 1-5) as well as serine-threonine kinases, 
calcium/calmodulin-dependent kinases 10 and MAP kinases 
(SEQ ID NOS:6-ll). As such, the novel polynucleotides 
encode a new kinase protein having homologies and 
orthologs across a range of phyla and species. 

The novel human polynucleotides described herein, 
encode open reading frames (ORFs) encoding proteins of 
238, 1,236, 974, 922 and 255 amino acids in length (see 
respectively SEQ ID NOS: 2, 4, 7, 9, 11). 

The invention also encompasses agonists and antagonists 
of the described NHPS, including small molecules, large 
molecules, mutant NHPs, or portions thereof, that compete 
with native NHP, peptides, and antibodies, as well as nucle- 
otide sequences that can be used to inhibit the expression of 
the described NHPs (e.g., antisense and ribozyme 
molecules, and gene or regulatory sequence replacement 
constructs) or to enhance the expression of the described 
NHP polynucleotides (e.g., expression constructs that place 
the described polynucleotide under the control of a strong 
promoter system), and transgenic animals that express a 
NHP transgene, or "knock-outs" (which can be conditional) 
that do not express a functional NHP. Knock-out mice can 
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be produced in several ways, one of which involves the use 
of mouse embryonic stem cells ("ES cells") lines that 
contain gene trap mutations in a murine homolog of at least 
one of the described NHPs. When the unique NHP 

5 sequences described in SEQ ID NOS: 1-11 are "knocked- 
out" they provide a method of identifying phenotypic 
expression of the particular gene as well as a method of 
assigning function to previously unknown genes. 
Additionally, the unique NHP sequences described in SEQ 

10 ID NOS: 1-11 are useful for the identification of coding 
sequence, the identification of the actual biologically rel- 
evant exon splice junctions and the mapping of a unique 
gene to a particular chromosome. 
Further, the present invention also relates to processes for 

15 identifying compounds that modulate, i.e., act as agonists or 
antagonists, of NHP expression and/or NHP activity that 
utilize purified preparations of the described NHPs and/or 
NHP product, or cells expressing the same. Such compounds 
can be used as therapeutic agents for the treatment of any of 

20 a wide variety of symptoms associated with biological 
disorders or imbalances. 

DESCRIPTION OF THE SEQUENCE LISTING 
AND FIGURES 

25 The Sequence Listing provides the sequence of the novel 
human ORFs encoding the described novel human kinase 
proteins. SEQ ID NO:5 describes a full length NHP ORF 
and flanking regions. 

30 DETAILED DESCRIPTION OF THE 

INVENTION 

The NHP, described for the first time herein, are novel 
proteins that are widely expressed. NHP SEQ ID NO: 1-5 are 

35 expressed in, inter alia, human cell lines, and human 
pituitary, lymph node, kidney, testis, thyroid, fetal kidney, 
and gene trapped cells. NHP SEQ ID NO:l-5 were compiled 
from gene trapped sequences in conjunction with sequences 
available in GENBANK, and cDNAs from a kidney mRNA 

40 (Edge Biosystems, Gaithersburg, Md.). 

The NHPs, described for the first time in NHP SEQ ID 
NO:6-ll are novel proteins expressed in, inter alia, human 
cell lines, and human pituitary, lymph node, kidney, colon, 
and prostate cells. HP SEQ ID NO:6-ll were compiled from 

45 sequences available in GENBANK, and cDNAs generated 
from kidney, prostate, and colon mRNA (Edge Biosystems, 
Gaithersburg, Md.). 

The present invention encompasses the nucleotides pre- 
sented in the Sequence Listing, host cells expressing such 

50 nucleotides, the expression products of such nucleotides, 
and: (a) nucleotides that encode mammalian homologs of 
the described polynucleotides, including the specifically 
described NHPfc, and the NHP products; (b) nucleotides that 
encode one or more portions of the NHPs that correspond to 

55 functional domains, and the polypeptide products specified 
by such nucleotide sequences, including but not limited to 
the novel regions of any active domain(s); (c) isolated 
nucleotides that encode mutant versions, engineered or 
naturally occurring, of the described NHPs in which all or a 

60 part of at least one domain is deleted or altered, and the 
polypeptide products specified by such nucleotide 
sequences, including but not limited to soluble proteins and 
peptides in which all or a portion of the signal (or hydro- 
phobic transmembrane) sequence is deleted; (d) nucleotides 

65 that encode chimeric fusion proteins containing all or a 
portion of a coding region of an NHP, or one of its domains 
(e.g., a receptor or ligand binding domain, accessory protein/ 
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self-association domain, etc.) fused to another peptide or 
polypeptide; or (e) therapeutic or diagnostic derivatives of 
the described polynucleotides such as oligonucleotides, anti- 
sense polynucleotides, ribozymes, dsRNA, or gene therapy 
constructs comprising a sequence first disclosed in the 5 
Sequence Listing. 

As discussed above, the present invention includes: (a) 
the human DNA sequences presented in the Sequence list- 
ing (and vectors comprising the same) and additionally 
contemplates any nucleotide sequence encoding a contigu- 10 
ous NHP open reading frame (ORF) that hybridizes to a 
complement of a DNA sequence presented in the Sequence 
Listing under highly stringent conditions, e.g., hybridization 
to filter-bound DNA in 0.5 M NaHP0 4 , 7% sodium dodecyl 
sulfate (SDS), 1 mM EDTA at 65° C, and washing in 15 
O.lxSSC/0.1% SDS at 68° C. (Ausubel F. M. et al. f eds., 
1989, Current Protocols in Molecular Biology, Vol I, Green 
Publishing Associates, Inc., and John Wiley & sons, Inc., 
New York, at p. 2.10.3) and encodes a functionally equiva- 
lent gene product. Additionally contemplated are any nucle- 20 
otide sequences that hybridize to the complement of a DNA 
sequence that encodes and expresses an amino acid 
sequence presented in the Sequence Listing under moder- 
ately stringent conditions, e.g., washing in 0.2xSSC/0.1% 
SDS at 42° C. (Ausubel et al., 1989, supra), yet still encodes 25 
a functionally equivalent NHP product. Functional equiva- 
lents of a NHP include naturally occurring NHPs present in 
other species and mutant NHPs whether naturally occurring 
or engineered (by site directed mutagenesis, gene shuffling, 
directed evolution as described in, for example, U.S. Pat. 30 
No. 5,837,458). The invention also includes degenerate 
nucleic acid variants of the disclosed NHP polynucleotide 
sequences. 

Additionally contemplated are polynucleotides encoding 
NHP ORFs, or their functional equivalents, encoded by 35 
polynucleotide sequences that are about 99, 95, 90, or about 
85 percent similar or identical to corresponding regions of 
the nucleotide sequences of the Sequence Listing (as mea- 
sured by BLAST sequence comparison analysis using, for 
example, the GCG sequence analysis package using stan- 4 q 
dard default settings). 

The invention also includes nucleic acid molecules, pref- 
erably DNA molecules, that hybridize to, and are therefore 
the complements of, the described NHP nucleotide 
sequences. Such hybridization conditions may be highly 45 
stringent or less highly stringent, as described above. In 
instances where the nucleic acid molecules are deoxyoligo- 
nucleotides ("DNA oligos"), such molecules are generally 
about 16 to about 100 bases long, or about 20 to about 80, 
or about 34 to about 45 bases long, or any variation or 50 
combination of sizes represented therein that incorporate a 
contiguous region of sequence first disclosed in the 
Sequence Listing. Such oligonucleotides can be used in 
conjunction with the polymerase chain reaction (PCR) to 
screen libraries, isolate clones, and prepare cloning and 
sequencing templates, etc. 

Alternatively, such NHP oligonucleotides can be used as 
hybridization probes for screening libraries, and assessing 
gene expression patterns (particularly using a micro array or 
high-throughput "chip" format). Additionally, a series of the 
described NHP oligonucleotide sequences, or the comple- 
ments thereof, can be used to represent all or a portion of the 
described NHP sequences. An oligonucleotide or polynucle- 
otide sequence first disclosed in at least a portion of one or 
more of the sequences of SEQ ID NOS: 1-11 can be used 
as a hybridization probe in conjunction with a solid support 
matrix/substrate (resins, beads, membranes, plastics, 
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polymers, metal or metallized substrates, crystalline or poly- 
crystalline substrates, etc.). Of particular note are spatially 
addressable arrays (i.e., gene chips, microtiter plates, etc.) of 
oligonucleotides and polynucleotides, or corresponding oli- 
gopeptides and polypeptides, wherein at least one of the 
biopolymers present on the spatially addressable array com- 
prises an oligonucleotide or polynucleotide sequence first 
disclosed in at least one of the sequences of SEQ ID NOS: 
1-11, or an amino acid sequence encoded thereby. Methods 
for attaching biopolymers to, or synthesizing biopolymers 
on, solid support matrices, and conducting binding studies 
thereon are disclosed in, inter alia, U.S. Pat. Nos. 5,700,637, 
5,556,752, 5,744,305, 4,631,211, 5,445,934, 5,252,743, 
4,713,326, 5,424,186, and 4,689,405 the disclosures of 
which are herein incorporated by reference in their entirety. 

Addressable arrays comprising sequences first disclosed 
in SEQ ID NOS: 1-11 can be used to identify and charac- 
terize the temporal and tissue specific expression of a gene. 
These addressable arrays incorporate oligonucleotide 
sequences of sufficient length to confer the required 
specificity, yet be within the limitations of the production 
technology. The length of these probes is within a range of 
between about 8 to about 2000 nucleotides. Preferably the 
probes consist of 60 nucleotides and more preferably 25 
nucleotides from the sequences first disclosed in SEQ ID 
NOS:l-ll. 

For example, a series of the described oligonucleotide 
sequences, or the complements thereof, can be used in chip 
format to represent all or a portion of the described 
sequences. The oligonucleotides, typically between about 16 
to about 40 (or any whole number within the stated range) 
nucleotides in length can partially overlap each other and/or 
the sequence may be represented using oligonucleotides that 
do not overlap. 

Accordingly, the described polynucleotide sequences 
shall typically comprise at least about two or three distinct 
oligonucleotide sequences of at least about 8 nucleotides in 
length that are each first disclosed in the described Sequence 
Listing. Such oligonucleotide sequences can begin at any 
nucleotide present within a sequence in the Sequence Listing 
and proceed in either a sense (5'-to-3') orientation vis-a-vis 
the described sequence or in an antisense orientation. 

Microarray-based analysis allows the discovery of broad 
patterns of genetic activity, providing new understanding of 
gene functions and generating novel and unexpected insight 
into transcriptional processes and biological mechanisms. 
The use of addressable arrays comprising sequences first 
disclosed in SEQ ID NOS: 1-11 provides detailed informa- 
tion about transcriptional changes involved in a specific 
pathway, potentially leading to the identification of novel 
components or gene functions that manifest themselves as 
novel phenotypes. 

Probes consisting of sequences first disclosed in SEQ ID 
NOS: 1-11 can also be used in the identification, selection 
and validation of novel molecular targets for drug discovery. 
The use of these unique sequences permits the direct con- 
firmation of drug targets and recognition of drug dependent 
changes in gene expression that are modulated through 
pathways distinct from the drugs intended target. These 
unique sequences therefore also have utility in defining and 
monitoring both drug action and toxicity. 

As an example of utility, the sequences first disclosed in 
SEQ ID NOS: 1-11 can be utilized in microarrays or other 
assay formats, to screen collections of genetic material from 
patients who have a particular medical condition. These 
investigations can also be carried out using the sequences 
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first disclosed in SEQ ID NOS:l-ll in silico and by In yet another embodiment, the antisense oligonucleotide 

comparing previously collected genetic databases and the will comprise at least one modified phosphate backbone 

disclosed sequences using computer software known to selected from the group consisting of a phosphorothioate, a 

those in the art. phosphorodithioate, a phosphoramidothioate, a 

Thus the sequences first disclosed in SEQ ID NOS:l-ll 5 phosphoramidate, a phosphordiamidate, a 

can be used to identify mutations associated with a particular methylphosphonate, an alkyl phosphotriester, and a formac- 

disease and also as a diagnostic or prognostic assay. etal or analog thereof. 

Although the presently described sequences have been In yet another embodiment, the antisense oligonucleotide 

specifically described using nucleotide sequence, it should ^ ^ ^anomeric oligonucleotide. An a-anomeric oligo- 

be appreciated that each of the sequences can uniquely be 10 nucleotide f orms specific double-stranded hybrids with 

described using any of a wide variety of additional structural com pi emen tary RNA in which, contrary to the usual p-units, 

attributes, or combinations thereof. For example, a given ^ strands mQ parallel tQ each Qther (Gautier et aLf 198 7, 

sequence can be described by the net composition of the Nud Adds Res 15:662 5-6641). The oligonucleotide is a 

nucleotides present within a given region of the sequence in y-O-methylribonucleotide (Inoue et 1987, Nucl. Acids 

conjunction with the presence of one or more specific 15 Reg 15:6131 ^ 61 4 8 ) ) or a chimeric RNA-DNA analogue 

oligonucleotide sequence^) first disclosed in the SEQ ID et ^ 198?> pEBS Uu 2 15:327-330). Alternatively, 

NOS: 1-11. Alternatively, a restriction map specifying the douWe stranded can be used to disrupt the expression 

relative positions of restriction endonuclease digestion sites, and fanction of a target ed NHP. 

or various palindromic or other sp^cmc ol^onucle :otide leotides of lhe invent ion can be synthesized by 

sequences can be used to structuraUy describe ^ a given 2Q S ^ ^ 

sequence. Such restriction maps, which are typically gener- t , ^ xrA t , . , . 6 • 

^ , L , M . , ' „ / ♦ul iT«i automated DNA synthesizer (such as are commercially 

ated by widely available computer programs (e.g., the Urn- . _. 7 i_ A V j t>- ♦ . \ a 

versity of Wisconsin GCG sequence analysis package, suable from Biosearch^ Applied Biosystems etc^ As 

SEQUENCHER 3.0, Gene Code!! Corp., Ann Arbor, Mich., examples^phosphorotkoate <^*"*< ^ £ 

etcjcanoptionally be used in conjunction with one ormore 25 ,he ? eth ° d ? f . S,em h et f ( ^^^f'fhe 

discrete nucleotide sequence(s) present in the sequence that 1*3209), and methylphosphonate ol gonudeotrdes can be 

can be described by the relative position of the sequence Pf V™* by use of controlled pore glass r»lymer supports 

, . ' ..... . r , x , (Sarin et al., 1988, Proc. Natl. Acad. Sci. U.VA. 

relative to one or more additional sequencers) or one or more oe^lioj^i^ 

restriction sites present in the disclosed sequence. 85: '448-/4:>l), e c - 

For oligonucleotide probes, highly stringent conditions 30 }™ stringency conditions are weU known to those of 

may referfe.g., to washing in 6xSSC/0.05% sodium pyro- the art, and will vary predic ably depending on he 

phosphate at 37° C. (for 14-base oligos), 48" C. (for 17-base speafic organ^ms from which the library and the labeled 

oligos), 55° C. (for 20-base oligos), and 60° C. (for 23-base sequences are derived. For guidance regarding such condi- 

oliios ! These nucleic acid molecules may encode or act as «>ons see, for example, Sambrook et al., 1989 Molecular 

NHP gene antisense molecules, useful, for example, in NHP 35 ™ 0am g' *}t°" t0I l ^JTff . ^ 

gene fegulation (for and/or as antisense primers in amplifi- ^P'™ 6 S ^ Ha * 0r ^ N ^ ™« A ™™ et 

cation reactions of NHP gene nucleic acid sequences). With Current Protocok m Molecular Biology, Green 

respect to NHPgene regulation, such techniques can be used Publishmg Assoaates and W.ley Intersc.ence, N.Y. 

to regulate biological functions. Further, such sequences Alternatively, suitably labeled NHP nucleotide probes can 

may be used as part of ribozyme and/or triple helix 40 be used to screen a human genomic library using appropri- 

sequences that are also useful for NHP gene regulation. ately stringent conditions or by PCR. The identification and 

Inhibitory antisense or double stranded oligonucleotides characterization of human genomic clones is helpful for 

can additionally comprise at least one modified base moiety identifying polymorphisms (deluding but not limited to, 

which is selected from the group including but not limited to nucleotide repeats, microsatelhte alleles, single nucleotide 

5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, 45 polymorphisms, or coding single nucleotide 

hypoxanthine, xantine, 4-acetylcy tosine, polymorphisms) determining the genomic structure of a 

5-(carboxyhydroxylmethyl) uracil, given locus/aUele, and designing diagnostic tests For 

5-carboxymethylaminomethyl-2-tbiouridine, example, sequences derived from regions adjacent to the 

5-carboxymethylaminomethyluracil, dihydrouracil, beta-D- wtron/exon boundaries of the human gene can be used to 

galactosylqueosine, inosine, N6-isopentenyladenine, 50 desi S° P rime ^. f ° r "» in r phfica "° n 10 

1- methylguanine, 1-methylinosine, 2,2-dimethylguanine, mutation, witbm the exons, mtrons, splice sites (e.g., splice 

2- methyladenine, 2-methylguanine, 3-methylcytosine, acceptor and/or donor sites), etc., that can be used in 
5-methylcytosine, N6-adenine, 7-methylguanine, diagnostics and pharmacogenomics. 
5-methylaminomethyluracil, 5-methoxyaminomethyl-2- Further, a NHP gene homolog can be isolated from 
thiouracil, beta-D-mannosylqueosine, 55 nucleic acid from an organism of interest by performing 
5'-methoxycarboxymethyluracil, 5-methoxyuracil, PCR using two degenerate or "wobble" oligonucleotide 
2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic primer pools designed on the basis of amino acid sequences 
acid (v), wybutoxosine, pseudouracil, queosine, within the NHP products disclosed herein. The template for 
2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, the reaction may be total RNA, mRNA, and/or cDNA 
4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid eo obtained by reverse transcription of mRNA prepared from 
metbylester, uracil-5-oxyacetic acid (v), 5-methyl-2- human or non-human cell lines or tissue known or suspected 
thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3) to express an allele of a NHP gene. 

w, and 2,6-diaminopurine. The PCR product can be subcloned and sequenced to 

The antisense oligonucleotide can also comprise at least ensure that the amplified sequences represent the sequence 

one modified sugar moiety selected from the group includ- 65 of the desired NHP gene. The PCR fragment can then be 

ing but not limited to arabinose, 2-fluoroarabinose, xylulose, used to isolate a full length cDNA clone by a variety of 

and hexose. methods. For example, the amplified fragment can be 
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labeled and used to screen a cDNA library, such as a 
bacteriophage cDNA library. Alternatively, the labeled frag- 
ment an be used to isolate genomic clones via the screening 
of a enomic library. 

PCR technology can also be used to isolate full length 5 
cDNA sequences. For example, RNA can be isolated, fol- 
lowing standard procedures, from an appropriate cellular or 
tissue source (i.e., one known, or suspected, to express a 
NHP gene). A reverse transcription (RT) reaction can be 
performed on the RNA using an oligonucleotide primer 30 
specific for the most 5' end of the amplified fragment for the 
priming of first strand synthesis. The resulting RNA/DNA 
hybrid may then be "tailed" using a standard terminal 
transferase reaction, the hybrid may be digested with RNase 
H, and second strand synthesis may then be primed with a 15 
complementary primer. Thus, cDNA sequences upstream of 
the amplified fragment can be isolated. For a review of 
cloning strategies that can be used, see e.g., Sambrook et al., 
1989, supra. 

A cDNA encoding a mutant NHP gene can be isolated, for 20 
example, by using PCR. In this case, the first cDNA strand 
may be synthesized by hybridizing an oligo-dT oligonucle- 
otide to mRNA isolated from tissue known or suspected to 
be expressed in an individual putatively carrying a mutant 
NHP allele, and by extending the new strand with reverse 25 
transcriptase. The second strand of the cDNA is then syn- 
thesized using an oligonucleotide that hybridizes specifi- 
cally to the 5' end of the normal gene. Using these two 
primers, the product is then amplified via PCR, optionally 
cloned into a suitable vector, and subjected to DNA 
sequence analysis through methods well known to those of 
skill in the art. By comparing the DNA sequence of the 
mutant NHP allele to that of a corresponding normal NHP 
allele, the mutation(s) responsible for the loss or alteration 
of function of the mutant NHP gene product can be ascer- 35 
tained. 

Alternatively, a genomic library can be constructed using 
DNA obtained from an individual suspected of or known to 
carry a mutant NHP allele (e.g., a person manifesting a ^ 
NHP-associated phenotype such as, for example, obesity, 
high blood pressure, connective tissue disorders, infertility, 
etc.), or a CDNA library can be constructed using RNA from 
a tissue known, or suspected, to express a mutant NHP 
allele. A normal NHP gene, or any suitable fragment thereof, 45 
can then be labeled and used as a probe to identify the 
corresponding mutant NHP allele in such libraries. Clones 
containing mutant NHP gene sequences can then be purified 
and subjected to sequence analysis according to methods 
well known to those skilled in the art. SQ 

Additionally, an expression library can be constructed 
utilizing cDNA synthesized from, for example, RNA iso- 
lated from a tissue known, or suspected, to express a mutant 
NHP allele in an individual suspected of or known to carry 
such a mutant allele. In this manner, gene products made by 55 
the putatively mutant tissue can be expressed and screened 
using standard antibody screening techniques in conjunction 
with antibodies raised against a normal NHP product, as 
described below. (For screening techniques, see, for 
example, Harlow, E. and Lane, eds., 1988, "Antibodies: A 60 
Laboratory Manual", Cold Spring Harbor Press, Cold Spring 
Harbor, N.Y.). 

Additionally, screening can be accomplished by screening 
with labeled NHP fusion proteins, such as, for example, 
alkaline phosphatase-NHP or NHP-alkaline phosphatase 65 
fusion proteins. In cases where a NHP mutation results in an 
expressed gene product with altered function (e.g., as a 
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result of a missense or a frameshift mutation), polyclonal 
antibodies to a NHP are likely to cross-react with a corre- 
sponding mutant NHP gene product. Library clones detected 
via their reaction with such labeled antibodies can be 
purified and subjected to sequence analysis according to 
methods well known in the art. 

The invention also encompasses (a) DNA vectors that 
contain any of the foregoing NHP coding sequences and/or 
their complements (i.e., antisense); (b) DNA expression 
vectors that contain any of he foregoing NHP coding 
sequences operatively associated with a regulatory element 
that directs the expression of the coding sequences (for 
example, baculo virus as described in U.S. Pat. No. 5,869, 
336 herein incorporated by reference); (c) genetically engi- 
neered host cells that contain any of the foregoing NHP 
coding sequences operatively associated with a regulatory 
element that directs the expression of the coding sequences 
in the host cell; and (d) genetically engineered host cells that 
express an endogenous NHP gene under the control of an 
exogenously introduced regulatory element (i.e., gene 
activation). As used herein, regulatory elements include, but 
are not limited to, inducible and non-inducible promoters, 
enhancers, operators and other elements known to those 
skilled in the art that drive and regulate expression. Such 
regulatory elements include but are not limited to the 
cytomegalovirus (hCMV) immediate early gene, 
regulatable, viral elements (particularly retroviral LTR 
promoters), the early or late promoters of S V40 adenovirus, 
the lac system, the trp system, the TAC system, the TRC 
system, the major operator and promoter regions of phage 
lambda, the control regions of fd coat protein, the promoter 
for 3-phosphoglycerate kinase (PGK), the promoters of acid 
phosphatase, and the promoters of the yeast c^mating fac- 
tors. 

The present invention also encompasses antibodies and 
anti- idiotypic antibodies (including Fab fragments), antago- 
nists and agonists of the NHP, as well as compounds or 
nucleotide constructs that inhibit expression of a NHP gene 
(transcription factor inhibitors, antisense and ribozyme 
molecules, or gene or regulatory sequence replacement 
constructs), or promote the expression of a NHP (e.g., 
expression constructs in which NHP coding sequences are 
operatively associated with expression control elements 
such as promoters, promoter/enhancers, etc.). 

The NHPs or NHP peptides, NHP fusion proteins, NHP 
nucleotide sequences, antibodies, antagonists and agonists 
can be useful for the detection of mutant NHPs or inappro- 
priately expressed NHPs for the diagnosis of disease. The 
NHP proteins or peptides, NHP fusion proteins, NHP nucle- 
otide sequences, host cell expression systems, antibodies, 
antagonists, agonists and genetically engineered cells and 
animals can be used for screening for drugs (or high 
throughput screening of combinatorial libraries) effective in 
the treatment of the symptomatic or phenotypic manifesta- 
tions of perturbing the normal function of NHP in the body. 
The use of engineered host cells and/or animals may offer an 
advantage in that such systems allow not only for the 
identification of compounds that bind to the endogenous 
receptor for an NHP, but can also identify compounds that 
trigger NHP-mediated activities or pathways. 

Finally, the NHP products can be used as therapeutics. For 
example, soluble derivatives such as NHP peptides/domains 
corresponding to NHPs, NHP fusion protein products 
(especially NHP-Ig fusion proteins, i.e., fusions of a NHP, or 
a domain of a NHP, to an IgFc), NHP antibodies and 
anti-idiotypic antibodies (including Fab fragments), antago- 
nists or agonists (including compounds that modulate or act 
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on downstream targets in a NHP-mediated pathway) can be of transgenic animals. Such techniques include, but are not 

used to directly treat diseases or disorders. For instance, the limited to pronuclear microinjection (Hoppe, P. C. and 

administration of an effective amount of soluble NHP, or a Wagner, T. E., 1989, U.S. Pat. No. 4,873,191); retrovirus 

NHP-IgFc fusion protein or an anti-idiotypic antibody (or its mediated gene transfer into germ lines (Van der Putten et al., 

Fab) that mimics the NHP could activate or effectively 5 1985, Proc. Natl. Acad. Sci., USA 82:6148-6152); gene 

antagonize the endogenous NHP receptor. Nucleotide con- targeting in embryonic stem cells (Thompson et al., 1989, 

structs encoding such NHP products can be used to geneti- Cell 56:313-321); electroporation of embryos (Lo, 1983, 

cally engineer host cells to express such products in vivo; Mol Cell. Biol. 3:1803-1814); and sperm-mediated gene 

these genetically engineered cells function as "bioreactors" transfer (Lavitrano et al., 1989, Cell 57:717-723); etc. For a 

in the body delivering a continuous supply of a NHP, a NHP 10 review of such techniques, see Gordon, 1989, Transgenic 

peptide, or a NHP fusion protein to the body. Nucleotide Animals, Intl. Rev. Cytol. 115:171-229, which is incorpo- 

constructs encoding functional NHPs, mutant NHPs, as well rated by reference herein in its entirety, 

as antisense and ribozyme molecules can also be used in Th e present invention provides for transgenic animals that 

"gene therapy" approaches for the modulation of NHP carrv tDe NHP transgene in all their cells, as well as animals 

expression. Thus, the invention also encompasses pharma- 15 which carry the transgene in some, but not all their cells, i.e., 

ceutical formulations and methods for treating biological mosaic animals or somatic cell transgenic animals. The 

disorders. transgene may be integrated as a single transgene or in 

Various aspects of the invention are described in greater concatamers, e.g., head-to-head tandems or head-to-tail tan- 
detail in the subsections below. dems. The transgene may also be selectively introduced into 
tt, mud c 20 aQC * act * v ated in a particular cell type by following, for 
The NHP Sequences example, the teaching of Lasko et al., 1992, Proc. Natl. 

The cDNA sequences and corresponding deduced amino Acad. Sci. USA 89:6232-6236. The regulatory sequences 

acid sequences of the described NHPs are presented in the required for such a cell-type specific activation will depend 

Sequence Listing. upon the particular cell type of interest, and will be apparent 

Expression analysis has provided evidence that the NHPs 25 to those of skill in the art. 

described in SEQ ID NO: 1-5 can be expressed in a When it is desired that a NHP transgene be integrated into 

relatively narrow range of human tissues. In addition to the chromosomal site of the endogenous NHP gene, gene 

myosin III kinases, the NHPs described in SEQ ID NO: 1-5 targeting is preferred. Briefly, when such a technique is to be 

also share significant similarity to a range of additional utilized, vectors containing some nucleotide sequences 

kinase families, including kinases associated with signal 30 homologous to the endogenous NHP gene are designed for 

transduction, from a variety of phyla and species. the purpose of integrating, via homologous recombination 

A number of polymorphisms can occur in the NHPs with chromosomal sequences, into and disrupting the func- 

described in SEQ ID NO: 1-5, such as a possible A-G tion of the nucleotide sequence of the endogenous NHP gene 

transition that can occur in the sequence region correspond- 35 (ic, "knockout" animals). 

ing to, for example, nucleotide position 889 of SEQ ID NO:3 The transgene can also be selectively introduced into a 

that can result in a K or E being present in the corresponding particular cell type, thus inactivating the endogenous NHP 

amino acid sequence represented by, for example, position gene in only that cell type, by following, for example, the 

297 of SEQ ID NO :4. Similar myosin-like proteins, as well teaching of Gu et al., 1994, Science, 265:103-106. The 

as uses and applications that are also applicable to the NHPs ^ regulatory sequences required for such a cell-type specific 

described in SEQ ID NO: 1-5, are described in U.S. Pat. No. inactivation will depend upon the particular cell type of 

6,001,593 herein incorporated by reference in its entirety. interest, and will be apparent to those of skill in the art. 

Expression analysis has provided evidence that the NHPs Once transgenic animals have been generated, the expres- 

described in SEQ ID NO: 6-11 can be expressed in a s ion of the recombinant NHP gene may be assayed utilizing 

relatively narrow range of human tissues. In addition to 45 standard techniques. Initial screening may be accomplished 

serine-threonine kinases, the NHPs described in SEQ ID by Southern blot analysis or PCR techniques to analyze 

NO: 6-11 also share significant similarity to a range of animal tissues to assay whether integration of the transgene 

additional kinase families, again including kinases associ- has taken place. The level of mRNA expression of the 

ated with signal transduction from a variety of phyla and transgene in the tissues of the transgenic animals may also 

species. The NHPs described in SEQ ID NO: 6-11 are 50 be assessed using techniques which include but are not 

apparently encoded on human chromosome 16. limited to Northern blot analysis of tissue samples obtained 

An additional application of the described novel human . from the animal, in situ hybridization analysis, and RT-PGR. 

jpolynucleotide sequences is their use in the molecular Samples of NHP gene-expressing tissue, may also be evalu- 

mutagenesis/evolution of proteins that are at least partially ated immunocytochemically using antibodies specific for 

encoded by the described novel sequences using, for 55 the NHP transgene product, 
example, polynucleotide shuffling or related methodologies. 

Such approaches are described in U.S. Pat. Nos. 5,830,721 mPS and NHP Polypeptides 

and 5,837,458 which are herein incorporated by reference in NHPs, polypeptides, peptide fragments, mutated, 

their entirety. truncated, or deleted forms of the NHPS, and/or NHP fusion 

NHP gene products can also be expressed in transgenic 60 proteins can be prepared for a variety of uses. These uses 

animals. Animals of any species, including, but not limited include but are not limited to the generation of antibodies, as 

to, worms, mice, rats, rabbits, guinea pigs, pigs, micro-pigs, reagents in diagnostic assays, the identification of other 

birds, goats, and non-human primates, e.g., baboons, cellular gene products related to a NHP, as reagents in assays 

monkeys, and chimpanzees may be used to generate NHP for screening for compounds that can be used as pharma- 

transgenic animals. 65 ceutical reagents useful in the therapeutic treatment of 

Any technique known in the art may be used to introduce mental, biological, or medical disorders and diseases. Given 

a NHP transgene into animals to produce the founder lines the similarity information and expression data, the described 
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NHPs can be targeted (by drugs, oligos, antibodies, etc,) in 
order to treat disease, or to therapeutically augment the 
efficacy of, for example, chemotherapeutic agents used in 
the treatment of breast or prostate cancer. 

The Sequence Listing discloses the amino acid sequences 
encoded by the described NHP polynucleotides. The NHPs 
typically display have initiator methionines in DNA 
sequence contexts consistent with a translation initiation 
site. 

The NHP amino acid sequences of the invention include 
the amino acid sequence presented in the Sequence Listing 
as well as analogues and derivatives thereof. Further, cor- 
responding NHP homologues from other species are encom- 
passed by the invention. In fact, any NHP protein encoded 
by the NHP nucleotide sequences described above are within 
the scope of the invention, as are any novel polynucleotide 
sequences encoding all or any novel portion of an amino 
acid sequence presented in the Sequence Listing. The degen- 
erate nature of the genetic code is well known, and, 
accordingly, each amino acid presented in the Sequence 
Listing, is generically representative of the well known 
nucleic acid "triplet" codon, or in many cases codons, that 
can encode the amino acid. As such, as contemplated herein, 
the amino acid sequences presented in the Sequence Listing, 
when taken together with the genetic code (see, for example, 
Table 4-1 at page 109 of "Molecular Cell Biology", 1986, J. 
Darnell et al. eds., Scientific American Books, New York, 
N.Y., herein incorporated by reference) are generically rep- 
resentative of all the various permutations and combinations 
of nucleic acid sequences that can encode such amino acid 
sequences. 

The invention also encompasses proteins that are func- 
tionally equivalent to the NHPs encoded by the presently 
described nucleotide sequences as judged by any of a 
number of criteria, including, but not limited to, the ability 
to bind and cleave a substrate of a NHP, or the ability to 
effect an identical or complementary downstream pathway, 
or a change in cellular metabolism (e.g., proteolytic activity, 
ion flux, tyrosine phosphorylation, etc.). Such functionally 
equivalent NHP proteins include, but are not limited to, 
additions or substitutions of amino acid residues within the 
amino acid sequence encoded by the NHP nucleotide 
sequences described above, but which result in a silent 
change, thus producing a functionally equivalent gene prod- 
uct. Amino acid substitutions may be made on the basis of 
similarity in polarity, charge, solubility, hydrophobicity, 
hydrophilicity, and/or the amphipathic nature of the residues 
involved. For example, nonpolar (hydrophobic) amino acids 
include alanine, leucine, isoleucine, valine, proline, 
phenylalanine, tryptophan, and methionine; polar neutral 
amino acids include glycine, serine, threonine, cysteine, 
tyrosine, asparagine, and glutamine; positively charged 
(basic) amino acids include arginine, lysine, and histidine; 
and negatively charged (acidic) amino acids include aspartic 
acid and glutamic acid. 

A variety of host-expression vector systems can be used 
to express the NHP nucleotide sequences of the invention. 
Where, as in the present instance, the NHP peptide or 
polypeptide is thought to be membrane protein, the hydro- 
phobic regions of the protein can be excised and the result- 
ing soluble peptide or polypeptide can be recovered from the 
culture media. Such expression systems also encompass 
engineered host cells that express a NHP, or functional 
equivalent, in situ. Purification or enrichment of a NHP from 
such expression systems can be accomplished using appro- 
priate detergents and lipid micelles and methods well known 
to those skilled in the art. However, such engineered host 
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cells themselves may be used in situations where it is 
important not only to retain the structural and functional 
characteristics of the NHP, but to assess biological activity, 
e.g., in drug screening assays. 

5 The expression systems that may be used for purposes of 
the invention include but are not limited to microorganisms 
such as bacteria (e.g., E. coli, B. sub tilis). transformed with 
recombinant bacteriophage DNA, plasmid DNA or cbsmid 
DNA expression vectors containing NHP nucleotide 

10 sequences; yeast (e.g., Saccharomyces, Pichia) transformed 
with recombinant yeast expression vectors containing NHP 
nucleotide sequences; insect cell systems infected with 
recombinant virus expression vectors (e.g., baculovirus) 
containing NHP sequences; plant cell systems infected with 
recombinant virus expression vectors (e.g., cauliflower 

15 mosaic virus, CaMV; tobacco mosaic virus, TMV) or trans- 
formed with recombinant plasmid expression vectors (e.g., 
Ti plasmid) containing NHP nucleotide sequences; or mam- 
malian cell systems (e.g., COS, CHO, BHK, 293, 3T3) 
harboring recombinant expression constructs containing 

20 promoters derived from the genome of mammalian cells 
(e.g., metallothionein promoter) or from mammalian viruses 
(e.g., the adenovirus late promoter; the vaccinia virus 7.5K 
promoter). 

In bacterial systems, a number of expression vectors may 

25 be advantageously selected depending upon the use intended 
for the NHP product being expressed. For example, when a 
large quantity of such a protein is to be produced for the 
generation of pharmaceutical compositions of or containing 
NHP, or for raising antibodies to a NHP, vectors that direct 

30 the expression of high levels of fusion protein products that 
are readily purified may be desirable. Such vectors include, 
but are not limited, to the E. coli expression vector pUR278 
(Ruther et al, 1983, EMBO J. 2:1791), in which a NHP 
coding sequence may be ligated individually into the vector 

35 in frame with the lacZ coding region so that a fusion protein 
is produced; pIN vectors (Inouye & Inouye, 1985, Nucleic 
Acids Res.- 13:3101-3109; Van Heeke & Schuster, 1989, J. 
Biol. Chem. 264:5503-5509); and the like. pGEX vectors 
(Pharmacia or American Type Culture Collection) can also 

40 be used to express foreign polypeptides as fusion proteins 
with glutathione S-transferase (GST). In general, such 
fusion proteins are soluble and can easily be purified from 
lysed cells by adsorption to glutathione -agarose beads fol- 
lowed by elution in the presence of free glutathione. The 

45 PGEX vectors are designed to include thrombin or factor Xa 
protease cleavage sites so that the cloned target gene product 
can be released from the QST moiety. 

In an insect system, Autographa calif ornica nuclear poly- 
hidrosis virus (AcNPV) is used as a vector to express foreign 

50 genes. The virus grows in Spodoptera frugiperda cells. A 
NHP coding sequence may be cloned individually into 
non-essential regions (for example the polyhedrin gene) of 
the virus and ..placed under control of an AcNPV promoter 
(for example the polyhedrin promoter). Successful insertion 

55 of NHP coding sequence will result in inactivation of the 
polyhedrin gene and production of non-occluded recombi- 
nant virus (i.e., virus lacking the proteinaceous coat coded 
for by the polyhedrin gene). These recombinant viruses are 
then used to infect Spodoptera frugiperda cells in which the 

60 inserted sequence is expressed (e.g., see Smith et al., 1983, 
J. Virol. 46:584; Smith, U.S. Pat. No. 4,215,051). 

In mammalian host cells, a number of viral-based expres- 
sion systems may be utilized. In cases where an adenovirus 
is used as an expression vector, the NHP nucleotide 

65 sequence of interest may be ligated to an adenovirus 
transcription/translation control complex, e.g., the late pro- 
moter and tripartite leader sequence. 
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This chimeric gene may then be inserted in the adenovirus 
genome by in vitro or in vivo recombination. Insertion in a 
non-essential region of the viral genome (e.g., region El or 
E3) will result in a recombinant virus that is viable and 
capable of expressing a NHP product in infected hosts (e.g., 
See Logan & Sbcnk, 1984, Proc. Natl, Acad. Sci. USA 
81:3655-3659). Specific initiation signals may also be 
required for efficient translation of inserted NHP nucleotide 
sequences. These signals include the ATG initiation codon 
and adjacent sequences. In cases where an entire NHP gene 
or cDNA, including its own initiation codon and adjacent 
sequences, is inserted into the appropriate expression vector, 
no additional translational control signals may be needed. 
However, in cases where only a portion of a NHP coding 
sequence is inserted, exogenous translational control 
signals, including, perhaps, the ATG initiation codon, must 
be provided. Furthermore, the initiation codon must be in 
phase with the reading frame of the desired coding sequence 
to ensure translation of the entire insert. These exogenous 
translational control signals and initiation codons can be of 
a variety of origins, both natural and synthetic. The effi- 
ciency of expression may be enhanced by the inclusion of 
appropriate transcription enhancer elements, transcription 
terminators, etc. (See Bitter et al., 1987, Methods in Enzy- 
mol. 153:516-544). 

In addition, a host cell strain may be chosen that modu- 
lates the expression of the inserted sequences, or modifies 
and processes the gene product in the specific fashion 
desired. Such modifications (e.g., glycosylation) and pro- 
cessing (e.g., cleavage) of protein products may be impor- 
tant for the function of the protein. Different host cells have 
characteristic and specific mechanisms for the post- 
translational processing and modification of proteins and 
gene products. Appropriate cell lines or host systems can be 
chosen to ensure the correct modification and processing of 
the foreign protein expressed. To this end, eukaryotic host 
cells which possess the cellular machinery for proper pro- 
cessing of the primary transcript, glycosylation, and phos- 
phorylation of the gene product may be used. Such mam- 
malian host cells include, but are not limited to, CHO, 
VERO, BHK, HeLa, COS, MDCK, 293, 3T3, WI38, and in 
particular, human cell lines. 

For long-term, high-yield production of recombinant 
proteins, stable expression is preferred. For example, cell 
lines which stably express the NHP sequences described 
above can be engineered. Rather than using expression 
vectors which contain viral origins of replication, host cells 
can be transformed with DNA controlled by appropriate 
expression control elements (e.g., promoter, enhancer 
sequences, transcription terminators, polyadenylation sites, 
etc.), and a selectable marker. Following the introduction of 
the foreign DNA, engineered cells may be allowed to grow 
for 1-2 days in an enriched media, and then are switched to 
a selective media. The selectable marker in the recombinant 
plasmid confers resistance to the selection and allows cells 
to stably integrate the plasmid into their chromosomes and 
grow to form foci which in turn can be cloned and expanded 
into cell lines. This method may advantageously be used to 
engineer cell lines which express the NHP product. Such 
engineered cell lines may be particularly useful in screening 
and evaluation of compounds that affect the endogenous 
activity of the NHP product. 

A number of selection systems may be used, including but 
not limited to the herpes simplex virus thymidine kinase 
(Wigler, et al., 1977, Cell 11:223), hypoxanthine-guanine 
phosphoribosyltransferase (Szybalska & Szybalski, 1962, 
Proc. Natl. Acad. Sci. USA 48:2026), and adenine phospho- 
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ribosyltransferase (Lowy, et al., 1980, Cell 22:817) genes 
can be employed in tk*, hgprt" or aprt" cells, respectively. 
Also, antimetabolite resistance can be used as the basis of 
selection for the following genes: dhfr, which confers resis- 

5 tance to methotrexate (Wigler, et al., 1980, Natl. Acad. Sci. 
USA 77:3567; O'Hare, et al., 1981, Proc. Natl. Acad. Sci. 
USA 78:1527); gpt, which confers resistance to mycophe- 
nolic acid (Mulligan & Berg, 1981, Proc. Natl. Acad. Sci. 
USA 78:2072); neo, which confers resistance to the ami- 

10 noglycoside G-418 (Colberre-Garapin, et al., 1981, J. Mol. 
Biol. 150:1); and hygro, which confers resistance to hygro- 
mycin (Santerre, et al., 1984, Gene 30:147). 

Alternatively, any fusion protein can be readily purified 
by utilizing an antibody specific for the fusion protein being 

t5 expressed. For example, a system described by Janknecht et 
al. allows for the ready purification of non-denatured fusion 
proteins expressed in human cell lines (Janknecht, et al., 
1991, Proc. Natl. Acad. Sci. USA 88:8972-8976). In this 
system, the gene of interest is subcloned into a vaccinia 

20 recombination plasmid such that the gene's open reading 
frame is translationally fused to an amino-terminai tag 
consisting of six histidine residues. Extracts from cells 
infected with recombinant vaccinia virus are loaded onto 
Ni 24 \nitriloacetic acid-agarose columns and histidine-tagged 

25 proteins are selectively eluted with imidazole-containing 
buffers. 

Also encompassed by the present invention are fusion 
proteins that direct the NHP to a target organ and/or facilitate 
transport across the membrane into the cytosol. Conjugation 

30 of NHPs to antibody molecules or their Fab fragments could 
be used to target cells bearing a particular epitope. Attaching 
the appropriate signal sequence to the NHP would also 
transport the NHP to the desired location within the cell. 
Alternatively targeting of NHP or its nucleic acid sequence 

35 might be achieved using liposome or lipid complex based 
delivery systems. Such technologies are described in "Lipo- 
somes: A Practical Approach", New, RRC ed., Oxford Uni- 
versity Press, New York and in U.S. Pat. Nos. 4,594,595, 
5,459,127, 5,948,767 and 6,110,490 and their respective 

40 disclosures which are herein incorporated by reference in 
their entirety. Additionally embodied are novel protein con- 
structs engineered in such a way that they facilitate transport 
of the NHP to the target site or desired organ. This goal may 
be achieved by coupling of the NHP to a cytokine or other 

45 ligand that provides targeting specificity, and/or to a protein 
transducing domain (see generally U.S. applications Ser. 
Nos. 60/111,701 and 60/056,713, both of which are herein 
incorporated by reference, for examples of such transducing 
sequences) to facilitate passage across cellular membranes if 

50 needed and can optionally be engineered to include nuclear 
localization sequences when desired. . 

> N Antibodies to NHP Products 

55 Antibodies that specifically recognize one or more 
epitopes of a NHP, or epitopes of conserved variants of a 
NHP, or peptide fragments of a NHP are also encompassed 
by the invention. Such antibodies include but are not limited 
to polyclonal antibodies, monoclonal antibodies (mAbs), 

60 humanized or chimeric antibodies, single chain antibodies, 
Fab fragments, F(ab% fragments, fragments produced by a 
Fab expression library, anti-idiotypic (anti-Id) antibodies, 
and epitope-binding fragments of any of the above. 
The antibodies of the invention may be used, for example, 

65 in the detection of NHP in a biological sample and may, 
therefore, be utilized as part of a diagnostic or prognostic 
technique whereby patients may be tested for abnormal 
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amounts of NHR Such antibodies may also be utilized in 
conjunction with, for example, compound screening 
schemes for the evaluation of the effect of test compounds 
on expression and/or activity of a NHP gene product. 
Additionally, such antibodies can be used in conjunction 5 
gene therapy to, for example, evaluate the normal and/or 
engineered NHP-expressing cells prior to .their introduction 
into the patient. Such antibodies may additionally be used as 
a method for the inhibition of abnormal NHP activity. Thus, 
such antibodies may, therefore, be utilized as part of treat- 10 
ment methods. 

For the production of antibodies, various host animals 
may be immunized by injection with a NHP, an NHP peptide 
(e.g., one corresponding to a functional domain of an NHP), 
truncated NHP polypeptides (NHP in which one or more 15 
domains have been deleted), functional equivalents of the 
NHP or mutated variant of the NHP. Such host animals may 
include but are not limited to pigs, rabbits, mice, goats, and 
rats, to name but a few. Various adjuvants may be used to 
increase the immunological response, depending on the host 20 
species, including but not limited to Freund's adjuvant 
(complete and incomplete), mineral salts such as aluminum 
hydroxide or aluminum phosphate, surface active substances 
such as lysolecithin, pluronic polyols, polyanions, peptides, 
oil emulsions, and potentially useful human adjuvants such 25 
as BCG (bacille Calmette-Guerin) and Corynebacterium 
parvum. Alternatively, the immune response could be 
enhanced by combination and or coupling with molecules 
such as keyhole limpet hemocyanin, tetanus toxoid, diph- 
theria toxoid, ovalbumin, cholera toxin or fragments thereof. 30 
Polyclonal antibodies are heterogeneous populations of anti- 
body molecules derived from the sera of the immunized 
animals. 

Monoclonal antibodies, which are homogeneous popula- ^ 
tions of antibodies to a particular antigen, can be obtained by 
any technique which provides for the production of antibody 
molecules by continuous cell lines in culture. These include, 
but are not limited to, the hybridoma technique of Kohler 
and Milstein, (1975, Nature 256:495-497; and U.S. Pat. No. 
4,376,110), the human B-cell hybridoma technique (Kosbor 
et ah, 1983, Immunology Today 4:72; Cole et al., 1983, 
Proc. Natl. Acad. Sci. USA 80:2026-2030), and the EBV- 
hybridoma technique (Cole et al., 1985, Monoclonal Anti- 
bodies And Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). ^ 
Such antibodies may be of any immunoglobulin class 
including IgG, IgM, IgE, IgA, IgD and any subclass thereof. 
The hybridoma producing the mAb of this invention may be 
cultivated in vitro or in vivo. Production of high titers of 
mAbs in vivo makes this the presently preferred method of 
production. 

. In addition, techniques developed for the production of 
"chimeric antibodies" (Morrison et al., 1984, Proc. Natl. 
Acad. Sci., 81:6851-6855; Neuberger et al., 1984, Nature, 
312:604-608; Takeda et al., 1985, Nature, 314:452-454) by 
splicing the genes from a mouse antibody molecule of 
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appropriate antigen specificity together with genes from a 
human antibody molecule of appropriate biological activity 
can be used. A chimeric antibody is a molecule in which 
different portions are derived from different animal species, 
such as those having a variable region derived from a murine 
mAb and a human immunoglobulin constant region. Such 
technologies are described in U.S. Pat. Nos. 6,075,181 and 
5,877,397 and their respective disclosures which are herein 
incorporated by reference in their entirety. Also encom- 
passed by the present invention is the use of fully humanized 
monoclonal antibodies as described in U.S. Pat. No. 6,150, 
584 and respective disclosures which are herein incorpo- 
rated by reference in their entirety. 

Alternatively, techniques described for the production of 
single chain antibodies (U.S. Pat. No. 4,946,778; Bird, 1988, 
Science 242:423-426; Huston et al., 1988, Proc. Natl. Acad. 
Sci. USA 85:5879-5883; and Ward et al., 1989, Nature 
341:544-546) can be adapted to produce single chain anti- 
bodies against NHP gene products. Single chain antibodies 
are formed by linking the heavy and light chain fragments of 
the Fv region via an amino acid bridge, resulting in a single 
chain polypeptide. 

Antibody fragments which recognize specific epitopes 
may be generated by known techniques. For example, such 
fragments include, but are not limited to: the F(ab') 2 frag- 
ments which can be produced by pepsin digestion of the 
antibody molecule and the Fab fragments which can be 
generated by reducing the disulfide bridges of the F(ab') 2 
fragments. Alternatively, Fab expression libraries may be 
constructed (Huse et al, 1989, Science, 246:1275-1281) to 
allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity. 

Antibodies to a NHP can, in turn, be utilized to generate 
anti-idiotype antibodies that "mimic" a given NHP, using 
techniques well known to those skilled in the art. (See, e.g., 
Greenspan & Bona, 1993, FASEB J 7(5):437-444; and 
NissinofI, 1991, J. Immunol. 147(8): 2429-2438). For 
example antibodies which bind to a NHP domain and 
competitively inhibit the binding of NHP to its cognate 
receptor can be used to generate anti-idiotypes that "mimic" 
the NHP and, therefore, bind and activate or neutralize a 
receptor. Such anti-idiotypic antibodies or Fab fragments of 
such anti-idiotypes can be used in therapeutic regimens 
involving a NHP mediated pathway. 

The present invention is not to be limited in scope by the 
specific embodiments described herein, which are intended 
as single illustrations of individual aspects of the invention, 
and functionally equivalent methods and components are 
within the scope of the invention. Indeed, various modifi- 
cations of the invention, in addition to those shown and 
described herein will become apparent to those skilled in the 
art from the foregoing description. Such modifications are 
intended to fall within the scope of the appended claims. All 
cited publications, patents, amd patent applications are 
herein incorporated by reference in their entirety. 



SEQUENCE LISTING 



<160> NUMBER OF SEQ ID NOS: 11 

<210> SEQ ID NO 1 
<211> LENGTH: 717 
<212> TYPE: DNA 
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-continued 



<213> ORGANISM: homo sapiens 
<400> SEQUENCE: 1 

atgatgcttg gacttgaatc acttccagat cccacagaca cctgggaaat tatagagacc 60 

attggtaaag gcacctatgg caaagtctac aaggtaacta acaagagaga tgggagcctg 120 

gctgcagtga aaattctgga tccagtcagt gatatggatg aagaaattga ggcagaatac 180 

aacattttgc agttccttcc taatcatccc aatgttgtaa agttttatgg gatgttttac 240 

aaagcggatc actgtgtagg gggacagctg tggctggtcc tggagctgtg taatgggggc 300 

tcagtcacyg agcttgtcaa aggtctactc agatgtggcc agcggttgga tgaagcaatg 360 

atctcataca tcttgtacgg ggccctcttg ggccttcagc atttgcacaa caaccgaatc 420 

atccaccgtg atgtgaaggg gaataacatt cttctgacaa cagaaggagg agttaagctc 480 

gttgactttg gtgtttcagc tcaactcacc agtacacgtc tgcggagaaa cacatctgtt 540 

ggcaccccat tctggatggc ccctgaggtc attgcctgtg agcagcagta tgactcttcc 600 

tatgacgctc gctgtgacgt ctggtccttg gggatcacag ctattgaact gggggatgga 660 

gaccctcccc tctttgacat gcatcctgtg aaaacactct ttaagattcc aaggtaa 717 



<210> SEQ ID NO 2 

<211> LENGTH: 238 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 2 

Met Met Leu Gly Leu Glu Ser Leu Pro Asp Pro Thr Asp Thr Trp Glu 
1 5 ■ .10 15 

He He Glu Thr He Gly Lys Gly Thr Tyr Gly Lys Val Tyr Lys Val 
20 25 30 

Thr Asn Lys Arg Asp Gly Ser Leu Ala Ala Val Lys He Leu Asp Pro 
.35 40 45 

Val Ser Asp Met Asp Glu Glu He Glu Ala Glu Tyr Asn He Leu Gin 
50 55 60 

Phe Leu Pro Asn His Pro Asn Val Val Lys Phe Tyr Gly Met Phe Tyr 
65 70 75 80 

Lys Ala Asp His Cys Val Gly Gly Gin Leu Trp Leu Val Leu Glu Leu 
85 90 95 

Cys Asn Gly Gly Ser Val Thr Glu Leu Val Lys Gly Leu Leu Arg Cys 
100 105 HO 

Gly Gin Arg Leu Asp Glu Ala Met He Ser Tyr He Leu Tyr Gly Ala 
115 120 125 

Leu Leu Gly Leu Gin His Leu His Asn Asn Arg He He His Arg Asp 
130 135 140 

Val Lys Gly Asn Asn He Leu Leu Thr Thr Glu Gly Gly Val Lys Leu 
145 150 155 160 

Val Asp Phe Gly Val Ser Ala Gin Leu Thr Ser Thr Arg Leu Arg Arg 
165 170 175 

Asn Thr Ser Val Gly Thr Pro Phe Trp Met Ala Pro Glu Val He Ala 
180 185 190 

Cys Glu Gin Gin Tyr Asp Ser Ser Tyr Asp Ala Arg Cys Asp Val Trp 
195 200 205 

Ser Leu Gly He Thr Ala He Glu Leu Gly Asp Gly Asp Pro Pro Leu 
210 215 220 

Phe Asp Met His Pro Val Lys Thr Leu Phe Lya He Pro Arg 
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20 



-continued 



225 



230 



235 



<210> SEQ ID NO 3 

<2U> LENGTH I 3711 

<212> TYPE : DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 3 



atgatgcttg 


gacttgaatc 


acttccagat 


cccacagaca 


cctgggaaat 


tatagagacc 


60 


attggtaaag 


gcacctatgg 


caaagtctac 


aaggtaacta 


acaagagaga 


tgggagcctg 


120 


gctgcagtga 


aaattctgga 


tccagtcagt 


gatatggatg 


aagaaattga 


ggcagaatac 


180 


aacattttgc 


agttccttcc 


taatcatccc 


aatgttgtaa 


agttttatgg 


gatgt;tt:t.ac 


240 


aaagcggatc 


actgtgtagg 


gggacagctg 


tggctggtcc 


tggagctgtg 


taatgggggc 


300 


tcagtcacyg 


agcttgtcaa 


aggtctactc 


agatgtggcc 


agcggttgga 


tgaagcaatg 


360 


atctcataca 


tcttgtacgg 


ggccctcttg 


ggccttcagc 


atttgcacaa 


caaccgaatc 


AO ft 


atccaccgtg 


atgtgaaggg 


gaataacatt 


cttctgacaa 


cagaaggagg 


agttaagctc 


480 


gttgactttg 


gtgtttcagc 


tcaactcacc 


agtacacgtc 


tgcggagaaa 


cacatctgtt 


540 


ggcaccccat 


tctggatggc 


ccctgaggtc 


attgcctgtg 


agcagcagta 


tgactcttcc 


600 


tatgacgctc 


gctgtgacgt 


ctggtccttg 


gggatcacag 


ctattgaact 


gggggatgga 


660 


gaccctcccc 


tctttgacat 


gcatcctgtg 


aaaacactct 


ttaagattcc 


aagaaatccfc 


720 


ccacctactt 


tacttcatcc 


agaaaaatgg 


tgtgaagaat 


tcaaccactt 


'batt't cacag 


780 


tgtcttatta 


aggattttga 


aaggcgacct 


tccgtcacac 


atctccttga 


ccacccattt 


840 


attaaaggag 


tacatggaaa 


agttctgttt 


ctgcaaaaac 


agctggccra 


ggttctccaa 




gaccagaagc 


atcaaaatcc 


tgttgctaaa 


accaggcatg 


agaggatgca 


taccagaaga 


960 


ccttatcatg 


tggaagatgc 


tgaaaaatac 


tgccttgagg 


atgatttggt 


c aacctagag 


1020 


gttctggatg 


aggatacaat 


tatccatcag 


ttgcagaaac 


gttatgcaga 


c t t gc t a at t 


1080 


tacacatatg 


ttggagacat 


cttaattgcc 


ttaaacccct 


tccagaatct 


aagca'tatac 


1140 


tctccacagt 


tttccagact 


ttatcatggg 


gtgaaacgcg 


cctccaaycc 


cccccacata 


1200 


tttgcatcag 


cagatgctgc 


ttaccagtgc 


atggrttactc 


tcagcaaaga 


ccagtgcatt 


i idft 


gtcatcagcg 


gagagagtgg 


ctctgggaag 


acagaaagcg 


cccacctgat 


tgttcarcat 


1320 


ttgactttct 


tgggaaaggc 


caataatcag 


accttgagag 


agaaaattct 


acaagtcaac 


1380 


tccctggtgg 


aagcctttgg 


gaactcatgc 


actgccatca 


atgacaactc 


gagccgtttt 


1440 


ggaaaatatc 


tggaaatgat 


gtttacacca 


actggagttg 


tgatgggggc 


aagaatctct 


1500 


gaatatctcc 


tggaaaaatc 


cagagttata 


aaacaggcag. 


cgagagagaa 


aaattttcat 


1560 


atattttact 


atatttatgc 


tggtcttcat 


caccaaaaga 


agctttctga 


tttcagactt 


" 1620 


cctgaggaaa 


aacctcctag 


gtacatagct 


gatgaaactg 


gaagggtgat 


gcacgacata 


1680 


acttccaagg 


agtcttacag 


aagacaattc 


gaagcaattc 


agcattgctt 


caggattata 


1740 


gggttcacgg 


acaaagaggt 


gcactcagtg 


tacagaattt 


tggctgggat 


tttgaatatt 


1800 


gggaacattg 


agttcgcagc 


tatttcctct 


caacatcaga 


ctgataaaag 


tgaggtgccc 


1860 


aatgctgaag 


ctttgcaaaa 


tgctgcctct 


gttctgtgca 


ttagccctga 


agagctccag 


1920 


gaggccctca 


cctcccactg 


tgrtggtcacc 


cggggcgaga 


ccatcatccg 


tgccaacact 


1980 


gtagacaggg 


ctgcggacgt 


tcgagacgcc 


atgtccaaag 


ccctgtatgg 


gaggctcttc 


2040 


agctggattg 


tgaatcgcat 


taatacactc 


ctgcagccag 


acgaaaacat 


atgtagtgca 


2100 
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-continued 



ggaggtggaa 


tgaatgtggg gatcttggat atctttggat tcgagaattt 


tcagagaaat 


2160 


tcatttgagc 


agctctgcat aaacatcgcc aatgagcaaa tccagtacta 


tttcaatcag 




catgtttttg ctcttgagca gatggaatat cagaatgaag gcattgatgc 


tatacccgtg 


2280 


gaatatgagg acaaccgccc gctcctggac atgttcctcc agaaacccct 


gggactgctt 


2340 


gcacttttgg 


atgaggaaag tcggtttccc caagcaactg accagaccct 


ggttgataaa 




tttgaagata 


atctacgatg caaatacttc tggaggccca aaggagtgga 


actgxgcttx 


2460 


ggcattcagc 


attatgctgg aaaggtatta tatgatgctt ctggggttct 


tgagaaaaat 


2520 


agagacactc 


tccctgccga tgtggttgtg gtcctgagaa cgtcagaaaa 


caagcttctt 




cagcagctct 


tctcaatccc tctgaccaaa acaggtaatt tggcccagac 


aagagctagg 


2640 


ataacagtgg 


cctcaagttc tttgcctcca catttcagtg ctgggaaagc 


caaggtggac 


£. 1 uu 


actctggagg 


tgatacggca tccggaagaa accaccaaca tgaagaggca 


aactgtggct 


2760 


tcttacttcc 


ggtattctct gatggacctg ctctccaaaa tggtggttgg 


acagccccac 




tttgtgcgct 


gcattaaacc caatgatgac cgagaggccc tgcagttctc 


tcgagagagg 


2 880 


gtgctggccc 


agctccgctc cacagggatt ctggagacog tcagcatccg 


ccgccagggc 


2940 


tattcccacc 


gcatcctttt tgaagaattt gtgaaaaggt attattactt 


ggcattcaca 


3000 


gcacatcaaa 


cacctcttgc tagcaaagag agctgtgtgg ctatcttgga 


aaagtccaga 


3060 


ttagatcact 


gggtgctggg aaaaacaaag gtttttctca aatattacca 


tgttgagcaa 


3 120 


ytaaatttgc 


tgcttcgaga agtcataggc agagtggttg tgctgcaggc 


atataccaag 


lion 
J AO U 


gggtggcttg 


gagccaggag atacaaaaag gtcagagaga agagagagaa 


gggagccatt 


3240 


gccatccagt 


cagcctggag aggatatgat gctcggagga aatttaagaa 


aataagcaac 


3300 


agaaggaatg 


agtctgctgc tcataatcaa gcaggggcca cttcaaacca 


aagcagtggg 


3360 


ccacattccc 


ccgtcgcagc aggtacgagg ggaagtgccg aggttcaaga 


ctgcagcgag 


3420 


cctggtgacc 


ataaagttct caggggctct gtacatcgta ggagccattc 


acaagcagaa 


3480 


tccaacaatg 


gccgtacaca gacttcaagc aactctcctg ctgtcacaga 


gaaaaatggg 


3540 


cattcacaag 


cccagagttc tccaaaaggg tgcgatatct tcgcaggaca 


-tgcaaacaag 


3600 


gtagctggat 


atcttgattc caaagtaaat gtgtatcact ccttcagact 


catccaagtt 


3660 


cataggcatg 


aagcttgtct gcggctgcgt ggttggacca tccaaacttg 


a 


3711 



<210> SEQ ID NO 4 

<211> LENGTH: 1236 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 4 

Met Met Leu Gly Leu Glu Ser Leu Pro Asp Pro Thr Asp Thr Trp Glu 
15 10 15 

He He Glu Thr He Gly Lye Gly Thr Tyr Gly Lys Val Tyr Lye Val 
20 25 30 

Thr Asn Lye Arg Asp Gly Ser Leu Ala Ala Val Lys He Leu Asp Pro 
35 40 45 

Val Ser Asp Met Asp Glu Glu He Glu Ala Glu Tyr Asn He Leu Gin 
50 55 60 

Phe Leu Pro Asn His Pro Asn Val Val Lys Phe Tyr Gly Met Phe Tyr 
65 70 75 80 

Lys Ala Asp His Cys Val Gly Gly Gin Leu Trp Leu Val Leu Glu Leu 
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85 90 95 

Cys Asn Gly Gly Ser Val Thr Glu Leu Val Lys Gly Leu Leu Arg Cys 
100 105 HO 

Gly Gin Arg Leu Asp Glu Ala Met lie Ser Tyr lie Leu Tyr Gly Ala 
115 120 125 

Leu Leu Gly Leu Gin His Leu His Asn Aan Arg lie He Hie Arg Asp 
130 135 140 

Val Lys Gly Asn Asn He Leu Leu Thr Thr Glu Gly Gly Val Lys Leu 
145 150 155 160 

Val Asp Phe Gly Val Ser Ala Gin Leu Thr Ser Thr Arg Leu Arg Arg 
165 170 175 

Asn Thr Ser Val Gly Thr Pro Phe Trp Met Ala Pro Glu Val He Ala 
180 185 190 

Cys Glu Gin Gin Tyr Asp Ser Ser Tyr Asp Ala Arg Cys Asp Val Trp 
195 200 205 

Ser Leu Gly He Thr Ala He Glu Leu Gly Asp Gly Asp Pro Pro Leu 
210 215 220 

Phe Asp Met His Pro Val Lys Thr Leu Phe Lys lie Pro Arg Asn Pro 
225 230 235 240 

Pro Pro Thr Leu Leu His Pro Glu Lys Trp Cys Glu Glu Phe Asn HiB 
245 250 255 

Phe He Ser Gin Cys Leu He Lys Asp Phe Glu Arg Arg Pro Ser Val 
260 265 270 

Thr His Leu Leu Asp His Pro Phe He Lys Gly Val His Gly Lys Val . 
275 280 285 

Leu Phe Leu Gin Lys Gin Leu Ala Lys Val Leu Gin Asp Gin Lys His 
290 295 300 

Gin Asn Pro Val Ala Lys Thr Arg His Glu Arg Met His Thr Arg Arg 
305 310 315 320 

Pro Tyr His Val Glu Asp Ala Glu LyB Tyr Cys Leu Glu Asp Asp Leu 
325 330 335 

Val Asn Leu Glu Val Leu Asp Glu Asp Thr He He His Gin Leu Gin 
340 345 350 

Lys Arg Tyr Ala Asp Leu Leu He Tyr Thr Tyr Val Gly Asp He Leu 
355 360 365 

He Ala Leu Asn Pro Phe Gin Asn Leu Ser He Tyr Ser Pro Gin Phe 
370 375 380 

Ser Arg Leu Tyr His Gly Val Lys Arg Ala Ser Asn Pro Pro His He 
385 390 395 400 

Phe Ala Ser Ala Asp Ala Ala Tyr Gin Cys Met Val Thr Leu Ser Lys 
405 410. 415 

Asp Gin Cys He Val He Ser Gly Glu Ser Gly Ser Gly Lys Thr Glu 
420 425 430 

Ser Ala His Leu He Val Gin His Leu Thr Phe Leu Gly Lys Ala Asn 
435 440 445 

Asn Gin Thr Leu Arg Glu Lys He Leu Gin Val Asn Ser Leu Val Glu 
450 455 460 

Ala Phe Gly Asn Ser Cys Thr Ala He Asn Asp Asn Ser Ser Arg Phe 
465 470 475 480 

Gly Lys Tyr Leu Glu Met Met Phe Thr Pro Thr Gly Val Val Met Gly 
485 490 495 

Ala Arg He Ser Glu Tyr Leu Leu Glu Lys Ser Arg Val He Lys Gin 
500 505 510 
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Ala Ala Arg Glu Lys Asn Phe His He Phe Tyr Tyr He Tyr Ala Gly 
515 520 525 

Leu His His Gin Lys Lye Leu Ser Asp Phe Arg Leu Pro Glu Glu Lys 
530 535 540 

Pro Pro Arg Tyr He Ala Asp Glu Thr Gly Arg Val Met His Asp He 
545 550 555. 560 

Thr Ser Lys Glu Ser Tyr Arg Arg Gin Phe Glu Ala He Gin His Cys 
565 570 575 

Phe Arg He lie Gly Phe Thr Asp Lys Glu Val His Ser Val Tyr Arg 
580 585 590 

He Leu Ala Gly He Leu Asn He Gly Asn He Glu Phe Ala Ala He 
595 600 605 

Ser Ser Gin His Gin Thr Asp Lys Ser Glu Val Pro Asn Ala Glu Ala 
610 615 620 

Leu Gin Asn Ala Ala Ser Val Leu Cys He Ser Pro Glu Glu Leu Gin 
625 630 635 640 

Glu Ala Leu Thr Ser His Cys Val Val Thr Arg Gly Glu Thr He He 
645 650 655 

Arg Ala Asn Thr Val Asp Arg Ala Ala Asp Val Arg Asp Ala Met Ser 
660 665 670 

Lys Ala Leu Tyr Gly Arg Leu Phe Ser Trp He Val Asn Arg He Asn 
675 680 685 

Thr Leu Leu Gin Pro Asp Glu Asn He Cys Ser Ala Gly Gly Gly Met 
690 695 700 

Asn Val Gly He Leu Asp He Phe Gly Phe Glu Asn Phe Gin Arg Asn 
705 710 715 720 

Ser Phe Glu Gin Leu Cys He Asn He Ala Asn Glu Gin He Gin Tyr 
725 730 735 

Tyr Phe Asn Gin His Val Phe Ala Leu Glu Gin Met Glu Tyr Gin Asn 
740 745 750 

Glu Gly He Asp Ala He Pro Val Glu Tyr Glu Asp Asn Arg Pro Leu 
755 760 765 

Leu Asp Met Phe Leu Gin Lys Pro Leu Gly Leu Leu Ala Leu Leu Asp 
770 775 780 

Glu Glu Ser Arg Phe Pro Gin Ala Thr Asp Gin Thr Leu Val Asp Lys 
785 790 795 800 

Phe Glu Asp Asn Leu Arg Cys Lys Tyr Phe Trp Arg Pro Lys Gly Val 
80S 810 ■ 815 

Glu Leu Cys Phe Gly He Gin His Tyr Ala Gly Lys Val Leu Tyr Asp 
820 825 830. 

Ala Ser Gly Val Leu Glu Lys Asn Arg Asp Thr Leu Pro Ala Asp Val 
835 840 845 

Val Val Val Leu Arg Thr Ser Glu Asn Lys Leu Leu Gin Gin Leu Phe 
850 855 860 

Ser He Pro Leu Thr Lys Thr Gly Asn Leu Ala Gin Thr Arg Ala Arg 
865 870 875 880 

He Thr Val Ala Ser Ser Ser Leu Pro Pro His Phe Ser Ala Gly Lys 
885 890 895 

Ala LyB Val Asp Thr Leu Glu Val He Arg His Pro Glu Glu Thr Thr 
900 905 910 

Asn Met Lys Arg Gin Thr Val Ala Ser Tyr Phe Arg Tyr Ser Leu Met 



915 



920 



925 
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Asp Leu Leu Ser Lys Met Val Val Gly Gin Pro His Phe Val Arg Cys 
930 935 940 

He Lys Pro Asn Asp Asp Arg Glu Ala Leu Gin Phe Ser Arg Glu Arg 
945 .950 955 960 

Val Leu Ala Gin Leu Arg Ser Thr Gly He Leu Glu Thr Val Ser He 
965 970 975 

Arg Arg Gin Gly Tyr Ser His Arg He Leu Phe Glu Glu Phe Val Lys 
980 985 990 

Arg Tyr Tyr Tyr Leu Ala Phe Thr Ala His Gin Thr Pro Leu Ala Ser 
995 1000 1005 

Lys Glu Ser Cys Val Ala He Leu Glu Lys Ser Arg Leu Asp His Trp 
1010 1015 1020 

Val Leu Gly Lys Thr Lys Val Phe Leu Lys Tyr Tyr His Val Glu Gin 
1025 1030 1035 1040 

Leu Asn Leu Leu Leu Arg Glu Val He Gly Arg Val Val Val Leu Gin 
1045 1050 1055 

Ala Tyr Thr Lys Gly Trp Leu Gly Ala Arg Arg Tyr Lys Lys Val Arg 
1060 1065 1070 

Glu Lys Arg Glu Lys Gly Ala He Ala He Gin Ser Ala Trp Arg Gly 
1075 1080 1085 

Tyr Aep Ala Arg Arg Lys Phe Lys Lys He Ser Asn Arg Arg Asn Glu 
1090 1095 1100 

Ser Ala Ala His Asn Gin Ala Gly Ala Thr Ser Asn Gin Ser Ser Gly 
1105 1110 1H5 U20 

Pro His Ser Pro Val Ala Ala Gly Thr Arg Gly Ser Ala Glu Val Gin 
1125 H30 U35 

Asp Cys Ser Glu Pro Gly Asp His Lys Val Leu Arg Gly Ser Val His 
1140 1145 H50 

Arg Arg Ser His Ser Gin Ala Glu Ser Asn Asn Gly Arg Thr Gin Thr 
1155 1160 H65 

Ser Ser Asn Ser Pro Ala Val Thr Glu Lys Asn Gly HiB Ser Gin Ala 
1170 H75 H80 

Gin Ser Ser Pro Lys Gly Cys Asp He Phe Ala Gly His Ala Asn Lys 
1185 H90 H95 1200 

Val Ala Gly Tyr Leu Asp Ser Lys Val Asn Val Tyr His Ser Phe Arg 
1205 1210 1215 

Leu He Gin Val His Arg His Glu Ala Cys Leu Arg Leu Arg Gly Trp 
1220 1225 1230 

Thr He Gin Thr 
1235 



<210> SEQ ID NO 5 
<211> LENGTH: 4034 
<212> TYPE I DNA 
<213> ORGANISM: homo sapiens 
<220> FEATURE: 

<221> NAME /KEY: misc.feature 
<222> LOCATION: (1)...(4034) 
<223> OTHER INFORMATION: n - A,T,C or G 

<400> SEQUENCE: 5 

ttttcaacaa gatggagtct tgctctgttt cccagcctgt agtgcagtga cacagtcttg 
gctcactgta acctctgcct cctgggttca agtgattctc ctgcctcagc ctcctgagta 
gctgggatta caggaaacat ctgtatggat tatttcacta taatcctatg atgcttggac 180 
ttgaatcact tccagatccc acagacacct gggaaattat agagaccatt ggtaaaggca 240 



60 
120 
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cctatggcaa 


agtctacaag 


gtaactaaca 


agagagatgg 


gagcctggct 


gcagtgaaaa 


300 


ttctggatcc 


agtcagtgat 


atggatgaag 


aaattgaggc 


agaatacaac 


attttgcagt 


360 


tccttcctaa 


tcatcccaat 


gttgtaaagt 


tttatgggat 


gttttacaaa 


gcggatcact 


420 


gtgtaggggg 


acagctgtgg 


ctggtcctgg 


agctgtgtaa 


tgggggctca 


gtcacygagc 


480 


ttgtcaaagg 


tctactcaga 


tgtggccagc 


ggttggatga 


agcaatgatc 


tcatacatct 


540 


tgtacggggc 


cctcttgggc 


cttcagcatt 


tgcacaacaa 


ccgaatcatc 


caccgtgatg 


600 


tgaaggggaa 


taacattctt 


ctgacaacag 


aaggaggagt 


taagctcgtt 


gactttggtg 


660 


tttcagctca 


actcaccagt 


acacgtctgc 


ggagaaacac 


atctgttggc 


accccattct 


720 


ggatggcccc 


tgaggtcatt 


gcctgtgagc 


agcagtatga 


ctcttcctat 


gacgctcgct 


780 


gtgacgtctg 


gtccttgggg 


atcacagcta 


ttgaactggg 


ggatggagac 


cctcccctct 


840 


ttgacatgca 


tcctgtgaaa 


acactcttta 


agattccaag 


aaatcctcca 


cctactttac 


900 


ttcatccaga 


aaaatggtgt 


gaagaattca 


accactttat 


ttcacagtgt 


cttattaagg 


960 


attttgaaag 


gcgaccttcc 


gtcacacatc 


tccttgacca 


cccatttatt 


aaaggagtac 


1020 


atggaaaagt 


tctgtttctg 


caaaaacagc 


tggccraggt 


tctccaagac 


cagaagcatc 


1080 


aaaatcctgt 


tgctaaaacc 


aggcatgaga 


ggatgcatac 


cagaagacct 


tatcatgtgg 


1140 


aagatgctga 


aaaatactgc 


cttgaggatg 


atttggtcaa 


cctagaggtt 


ctggatgagg 


1200 


atacaattat 


ccatcagttg 


cagaaacgtt 


atgcagactt 


gctaatttac 


acatatgttg 


1260 


gagacatctt 


aattgcctta 


aaccccttcc 


agaatctaag 


catatactct .ccacagtttt 


1320 


ccagacttta 


tcatggggtg 


aaacgcgcct 


ccaayccccc 


ccacatattt gcatcagcag 


1380 


atgctgctta 


ccagtgcatg 


gttactctca 


gcaaagacca 


gtgcattgtc atcagcggag 


1440 


agagtggctc 


tgggaagaca 


gaaagcgccc 


acctgattgt 


tcarcatttg actttcttgg 


1500 


gaaaggccaa 


taatcagacc 


ttgagagaga 


aaattctaca 


agtcaactcc 


ctggtggaag 


1560 


cctttgggaa 


ctcatgcact 


gccatcaatg 


acaactcgag 


ccgttttgga aaatatctgg 


1620 


aaatgatgtt 


tacaccaact 


ggagttgtga 


tgggggcaag 


aatctctgaa tatctcctgg 


1680 


aaaaatccag 


agttataaaa 


caggcagcga 


gagagaaaaa 


ttttcatata 


ttttactata 


1740 


tttatgctgg 


tcttcatcac 


caaaagaagc 


tttctgattt 


cagacttcct gaggaaaaac 


1800 


ctcctaggta 


catagctgat 


gaaactggaa 


gggtgatgca 


cgacataact tccaaggagt 


I860 


cttacagaag 


acaattcgaa 


gcaattcagc 


attgcttcag 


gattataggg ttcacggaca 


1920 


aagaggtgca 


ctcagtgtac 


agaattttgg 


ctgggatttt 


gaatattggg aacattgagt 


1980 


tcgcagctat 


ttcctctcaa 


catcagactg 


ataaaagtga 


ggtgcccaat gctgaagctt 


2040 


tgcaaaatgc 


tgcctctgtt 


ctgtgcatta 


gccctgaaga 


gctccaggag gccctcacct 


2100 


cccactgtgt 


ggtcacccgg 


ggcgagacca 


tcatccgtgc 


caacactgta gacagggctg 


2160 


cggacgttcg 


agacgccatg 


tccaaagccc 


tgtatgggag 


gctcttcagc tggattgtga 


2220 


atcgcattaa 


tacactcctg 


cagccagacg 


aaaacatatg 


tagtgcagga ggtggaatga 


2280 


atgtggggat 


cttggatatc 


tttggattcg 


agaattttca 


gagaaattca tttgagcagc 


2340 


tctgcataaa 


catcgccaat 


gagcaaatcc 


agtactattt 


caatcagcat gtttttgctc 


2400 


ttgagcagat 


ggaatatcag 


aatgaaggca 


ttgatgctat 


acccgtggaa tatgaggaca 


2460 


accgcccgct 


cctggacatg 


ttcctccaga 


aacccctggg 


actgcttgca cttttggatg 


2520 


aggaaagtcg 


gtttccccaa 


gcaactgacc 


agaccctggt 


tgataaattt gaagataatc 


2580 
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tacgatgcaa atacttctgg aggcccaaag gagtggaact gtgctttggc nttcagcatt 2640 
atgctggaaa ggtattatat gatgcttctg gggttcttga gaaaaataga gacactctcc 2700 
ctgccgatgt ggttgtggtc ctgagaacgt cagaaaacaa gcttcttcag cagctcttct 2760 



caagttcttt 


gcctccacat 


ttcagtgctg ggaaagccaa ggtggacact ctggaggtga 


2880 


tacggcatcc 


ggaagaaacc 


accaacatga agaggcaaac 


tgtggcttct tacttccggt 


2940 


attctctgat 


ggacctgctc 


tccaaaatgg tggttggaca 


gccccacttt gtgcgctgca 


3000 


ttaaacccaa 


tgatgaccga 


gaggccctgc agttctctcg 


agagagggtg ctggcccagc 


306 0 


tccgctccac 


agggattctg 


gagacagtca gcatccgccg 


ccagggctat tcccaccgca 


3 120 


tcctttttga 


agaatttgtg 


aaaaggtatt attacttggc 


attcacagca catcaaacac 


3 180 


ctcttgctag 


caaagagagc 


tgtgtggcta tcttggaaaa 


gtccagatta gatcactggg 


3240 


tgctgggaaa 


aacaaaggtt 


tttctcaaat attaccatgt 


tgagcaayta aatttgctgc 


3300 


ttcgagaagt 


cataggcaga 


gtggttgtgc tgcaggcata 


taccaagggg tggcttggag 


3360 


ccaggagata 


caaaaaggtc 


agagagaaga gagagaaggg agccattgcc atccagtcag 


3420 


cctggagagg 


atatgatgct 


cggaggaaat ttaagaaaat 


aagcaacaga aggaatgagt 


3480 


ctgctgctca 


taatcaagca 


ggggccactt caaaccaaag 


cagtgggcca cattcccccg 


3540 


tcgcagcagg 


tacgagggga 


agtgccgagg ttcaagactg cagcgagcct ggtgaccata 


3600 


aagttctcag 


gggctctgta 


catcgtagga gccattcaca agcagaatcc aacaatggcc 


3660 


gtacacagac 


ttcaagcaac 


tctcctgctg tcacagagaa 


aaatgggcat tcacaagccc 


3720 


agagttctcc 


aaaagggtgc 


gatatcttcg caggacatgc 


aaacaaggta gctggatatc 


3780 


ttgattccaa 


agtaaatgtg 


tatcactcct tcagactcat ccaagttcat aggcatgaag 


3B40 


cttgtctgcg 


gctgcgtggt 


tggaccatcc aaacttgaaa ctgttagtga tattttgaag 


3900 


tctttgagac 


aaaagcccag 


cttgctgaag aactttggtt cagtagagag acagggaggt 


3960 


acaggggaga 


gagaatcaaa 


agcctggaaa tttgctgctg agaataaatg ttagctgctc 


4020 


cctggnngna 


aaaa 






4034 



<210> SEQ ID NO 6 

<211> LENGTH: 2925 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 6 

atgagaaggg cggggatcgg cgaggactcc aggctggggt tgcaggccca gccaggggcg 60 

gagccttctc cgggtcgggc ggggacagag. cgctcccttg gaggcaccca gggacctggc 120 

cagccgtgca gctgcccagg cgctatggcg agtgcggtca gggggtcgag gccgtggccc —.180 

cggctggggc tccagctcca gttcgcggcg ctgctgctcg ggacgctgag tccacaggtt 240 

catactctca ggccagagaa cctcctgctg gtgtccacct tggatggaag tctccacgca 300 

ctaagcaagc agacagggga cctgaagtgg actctgaggg atgatcccgt catcgaagga 360 

ccaatgtacg tcacagaaat ggcctttctc tctgacccag cagatggcag cctgtacatc 420 

ttggggaccc aaaaacaaca gggattaatg aaactgccat tcaccatccc tgagctggtt 480 

catgcctctc cctgccgcag ctctgatggg gtcttctaca caggccggaa gcaggatgcc 540 

tggtttgtgg tggaccctga gtcaggggag acccagatga cactgaccac agagggtccc 600 

tccacccccc gcctctacat tggccgaaca cagtatacgg tcaccatgca tgacccaaga 660 



caatccctct 



gaccaaaaca 



ggtaatttgg cccagacaag agctaggata acagtggcct 



2820 
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gccccagccc tgcgctggaa caccacctac cgccgctact cagcgccccc catggatggc 720 

tcacctggga aatacatgag ccacctggcg tcctgcggga tgggcctgct gctcactgtg 780 

gacccaggaa gcgggacggt gctgtggaca caggacctgg gcgtgcctgt gatgggcgtc . 840 

tacacctggc accaggacgg cctgcgccag ctgccgcatc tcacgctggc tcgagacact 900 

ctgcatttcc tcgccctccg ctggggccac atccgactgc ctgcctcagg cccccgggac 960 

acagccaccc tcttctctac cttggacacc cagctgctaa tgacgctgta tgtggggaag 1020 

gatgaaactg gcttctatgt ctytaaagca ctggtccaca caggagtggc cctggtgcct 1080 

cgtggactga ccctggcccc cgcagatggc cccaccacag atgaggtgac actccaagtc 1140 

tcaggagagc gagagggctc acccagcact gctgttagat acccctcagg cagtgtggcc 1200 

ctcccaagcc agtggctgct cattggacac cacgagctac ccccagtcct gcacaccacc 1260 

atgctgaggg tccatcccac cctggggagt ggaactgcag agacaagacc tccagagaat 1320 

acccaggccc cagccttctt cttggagcta ttgagcctga gccgagagaa actttgggac 1380 

tccgagctgc atccagaaga aaaaactcca gactcttact tggggctggg accccaagac 144 0 

ctgctggcag ctagcctcac tgctgtcctc ctgggagggt ggattctctt tgtgatgagg 1500 

cagcaacagc cgcaggtggt ggagaagcag caggagaccc ccctggcacc tgcagacttt 1560 

gctcacatct cccaggatgc ccagtccctg cactcggggg ccagccggag gagccagaag 1620 

aggcttcaga gtccctcaaa gcaagcccag ccactcgacg accctgaagc tgagcaactc 1680 

accgtagtgg ggaagatttc cttcaatccc aaggacgtgc tgggccgcgg ggcaggcggg 1740 

actttcgttt tccggggaca gtttgaggga cgggcagtgg ctgtcaagcg gctcctccgc 1B00 

gagtgctttg gcctggttcg gcgggaagtt caactgctgc aggagtctga caggcacccc 1860 

aacgtgctcc gctacttctg caccgagcgg ggaccccagt tccactacat tgccctggag 1920 

ctctgccggg cctccttgca ggagtacgta gaaaacccgg acctggatcg cgggggtctg 1980 

gagcccgagg tcgtgctgca gcagctgatg tctggcctgg cccacctgca ctctttacac 2040 

atagtgcacc gggacctgaa gccaggaaat attctcatca ccgggcctga cagccagggc 2100 

ctgggcagag tggtgctctc agacttcggc ctctgcaaga agctgcctgc tggccgctgt 2160 

agcttcagcc tccactccgg catccccggc acggaaggct ggatggcgcc cgagcttctg 2220 

cagctcctgc caccagacag tcctaccagc gctgtggaca tcttctctgc aggctgcgtg 2280 

ttctactacg tgctttctgg tggcagccac ccctttggag acagtcttta tcgccaggca 2340 

aacatcctca caggggctcc ctgtctggct cacctggagg aagaggtcca cgacaaggtg 2400 

gttgcccggg acctggttgg agccatgttg agcccactgc cgcagccacg cccctctgcc 2460 

ccccaggtgc tggcccaccc cttcttttgg agcagagcca agcaactcca gttcttccag 2520 

gacgtcagtg actggctgga gaaggagtcc gagcaggagc ccctggtgag ggcactggag 2580 

gcgggaggct gcgcagtggt ccgggacaac tggcacgagc acatctccat gccgctgcag 2640 

acagatctga gaaagttccg gtcctataag gggacatcag tgcgagacct gctccgtgct 2700 

gtgaggaaca agaagcacca ctacagggag ctcccagttg aggtgcgaca ggcactcggc 2760 

caagtccctg atggcttcgt ccagtacttc acaaaccgct tcccacggct gctcctccac 2820 

acgcaccgag ccatgaggag ctgcgcctct gagagcctct tcctgcccta ctacccgcca 2880 

gactcagagg ccaggaggcc atgccctggg gccacaggga ggtga 2925 



<210> SEQ ID NO 7 
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<211> LENGTH: 974 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<220> FEATURE: 

<221> NAME/KEY: VARIANT 

<222>. LOCATION: (1)...(974) 

<223> OTHER INFORMATION: Xaa - Any Amino Acid 
<400> SEQUENCE: 7 

Met Arg Arg Ala Gly lie Gly Glu Asp Ser Arg Leu Gly Leu Gin Ala 
1 5 10 15 

Gin Pro Gly Ala Glu Pro Ser Pro Gly Arg Ala Gly Thr Glu Arg Ser 
20 25 30 

Leu Gly Gly Thr Gin Gly Pro Gly Gin Pro Cys Ser Cys Pro Gly Ala 
35 40 45 

Met Ala Ser Ala Val Arg Gly Ser Arg Pro Trp Pro Arg Leu Gly Leu 
50 55 60 

Gin Leu Gin Phe Ala Ala Leu Leu Leu Gly Thr Leu Ser Pro Gin Val 
65 70 75 80 



His Thr Leu Arg Pro Glu Asn Leu Leu Leu Val Ser Thr Leu Asp Gly 
85 90 95 

Ser Leu His Ala Leu Ser Lys Gin Thr Gly Asp Leu Lys Trp Thr Leu 
100 105 110 

Arg Asp Asp Pro Val lie Glu Gly Pro Met Tyr Val Thr Glu Met Ala 
115 120 125 

Phe Leu Ser Asp Pro Ala Asp Gly Ser Leu Tyr lie Leu Gly Thr Gin 
130 135 .140 

Lys Gin Gin Gly Leu Met Lys Leu Pro Phe Thr lie Pro Glu Leu Val 
145 150 155 160 

His Ala Ser Pro Cys Arg Ser Ser Asp Gly Val Phe Tyr Thr Gly Arg 
165 170 175 

Lys Gin Asp Ala Trp Phe Val Val Asp Pro Glu Ser Gly Glu Thr Gin 
180 185 190 

Met Thr Leu Thr Thr Glu Gly Pro Ser Thr Pro Arg Leu Tyr He Gly 
195 200 205 

Arg Thr Gin Tyr Thr Val Thr Met His Asp Pro Arg Ala Pro Ala Leu 
210 215 220 

Arg Trp Asn Thr Thr Tyr Arg Arg Tyr Ser Ala Pro Pro Met Asp Gly 
225 230 235 240 

Ser Pro Gly Lys Tyr Met Ser His Leu Ala Ser Cys Gly Met Gly Leu 
245 250 255 

Leu Leu Thr Val Asp Pro Gly Ser Gly Thr Val Leu Trp Thr Gin Asp 
260 265 270 

Leu Gly Val Pro Val Met Gly Val Tyr Thr Trp His Gin Asp Gly Leu 
275 280 285 

Arg Gin Leu Pro His Leu Thr Leu Ala Arg Asp Thr Leu His Phe Leu 
290 295 300 

Ala Leu Arg Trp Gly His He Arg Leu Pro Ala Ser Gly Pro Arg Asp 
305 310 315 320 

Thr Ala Thr Leu Phe Ser Thr Leu Asp Thr Gin Leu Leu Met Thr Leu 
325 330 335 

Tyr Val Gly Lys Asp Glu Thr Gly Phe Tyr Val Xaa Lys Ala Leu Val 
340 345 350 



His Thr Gly Val Ala Leu Val Pro Arg Gly Leu Thr Leu Ala Pro Ala 
355 360 365 
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Asp Gly Pro Thr Thr Asp Glu Val Thr Leu Gin Val Ser Gly Glu Arg 
370 375 380 

Glu Gly Ser Pro Ser Thr Ala Val Arg Tyr Pro Ser Gly Ser Val Ala 
385. 390 ■ 395 400 

Leu Pro Ser Gin Trp Leu Leu lie Gly His His Glu Leu Pro Pro Val 
405 410 415 

Leu His Thr Thr Met Leu Arg Val His Pro Thr Leu Gly Ser Gly Thr 
420 425 430 

Ala Glu Thr Arg Pro Pro Glu Asn Thr Gin Ala Pro Ala Phe Phe Leu 
435 440 445 

Glu Leu Leu Ser Leu Ser Arg Glu Lys Leu Trp Asp Ser Glu Leu His 
450 455 460 

Pro Glu Glu Lys Thr Pro Asp Ser Tyr Leu Gly Leu Gly Pro Gin Asp 
465 470 475 480 

Leu Leu Ala Ala Ser Leu Thr Ala Val Leu Leu Gly Gly Trp He Leu 
485 490 495 

Phe Val Met Arg Gin Gin Gin Pro Gin Val Val Glu Lys Gin Gin Glu 
500 505 510 

Thr Pro Leu Ala Pro Ala Asp Phe Ala His He Ser Gin Asp Ala Gin 
515 520 525 

Ser Leu His Ser Gly Ala Ser Arg Arg Ser Gin Lys Arg Leu Gin Ser 
530 535 540 

Pro Ser Lys Gin Ala Gin Pro Leu Asp Asp Pro Glu Ala Glu Gin Leu 
545 550 555 560 

Thr Val Val Gly Lye He Ser Phe Asn Pro Lys Asp Val Leu Gly Arg 
565 570 575 

Gly Ala Gly Gly Thr Phe Val Phe Arg Gly Gin Phe Glu Gly Arg Ala 
580 585 590 

Val Ala Val Lys Arg Leu Leu Arg Glu Cys Phe Gly Leu Val Arg Arg 
595 600 605 

Glu Val Gin Leu Leu Gin Glu Ser Asp Arg HiB Pro Asn Val Leu Arg 
610 615 620 

Tyr Phe Cys Thr Glu Arg Gly Pro Gin Phe His Tyr He Ala Leu Glu 
625 630 635 640 

Leu Cys Arg Ala Ser Leu Gin Glu Tyr Val Glu Asn Pro Asp Leu Asp 
645 650 655 

Arg Gly Gly Leu Glu Pro Glu Val Val Leu Gin Gin Leu Met Ser Gly 
660 665 670 

Leu Ala His Leu His Ser Leu His He Val His Arg Asp Leu Lys Pro 
675 680 685 

Gly Asn He Leu He Thr Gly Pro Asp Ser Gin Gly Leu Gly Arg Val 
690 695 700 

Val Leu Ser Asp Phe Gly Leu Cys Lys Lys Leu Pro Ala Gly Arg Cys 
705 710 715 720 

Ser Phe Ser Leu His Ser Gly He Pro Gly Thr Glu Gly Trp Met Ala 
725 730 735 

Pro Glu Leu Leu Gin Leu Leu Pro Pro Asp Ser Pro Thr Ser Ala Val 
740 745 750 

Asp He Phe Ser Ala Gly Cys Val Phe Tyr Tyr Val Leu Ser Gly Gly 
755 760 765 

Ser HiB Pro Phe Gly Asp Ser Leu Tyr Arg Gin Ala Asn He Leu Thr 
770 775 780 

Gly Ala Pro Cys Leu Ala His Leu Glu Glu Glu Val His Asp Lys Val 
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785 



790 



795 



800 



Val Ala Arg Asp Leu Vol Gly Ala Met Leu Ser Pro Leu Pro Gin Pro 
805 810 815 

Arg Pro Ser Ala Pro Gin Val Leu Ala Hie Pro Phe Phe Trp Ser Arg 
820 825 830 

Ala Lys Gin Leu Gin Phe Phe Gin Asp Val Ser Asp Trp Leu Glu Lys 
835 840 845 

Glu Ser Glu Gin Glu Pro Leu Val Arg Ala Leu Glu Ala Gly Gly Cys 
850 855 860 

Ala Val Val Arg Asp Asn Trp His Glu His He Ser Met Pro Leu Gin 
865 870 875 880 

Thr Asp Leu Arg Lys Phe Arg Ser Tyr Lys Gly Thr Ser Val Arg Asp 
885 890 895 

Leu Leu Arg Ala Val Arg Asn Lys Lys His His Tyr Arg Glu Leu Pro 
900 905 910 

Val Glu Val Arg Gin Ala Leu Gly Gin Val Pro Asp Gly Phe Val Gin 
915 920 925 

Tyr Phe Thr Asn Arg Phe Pro Arg Leu Leu Leu His Thr His Arg Ala 
930 935 940 

Met Arg Ser Cys Ala Ser Glu Ser Leu Phe Leu Pro Tyr Tyr Pro Pro 
945 950 955 960 

Asp Ser Glu Ala Arg Arg Pro Cys Pro Gly Ala Thr Gly Arg 



<210> SEQ ID NO 8 

<211> LENGTH: 2769 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE : 8 

atgagaaggg cggggatcgg cgaggactcc aggctggggt tgcaggccca gccaggggcg 60 

gagccttctc cgggtcgggc ggggacagag cgctcccttg gaggcaccca gggacctggc 120 

cagccgtgca gctgcccagg cgctatggcg agtgcggtca gggggtcgag gccgtggccc 180 

cggctggggc tccagctcca gttcgcggcg ctgctgctcg ggacgctgag tccacaggtt 240 

catactctca ggccagagaa cctcctgctg gtgtccacct tggatggaag tctccacgca 300 

ctaagcaagc agacagggga cctgaagtgg actctgaggg atgatcccgt catcgaagga 360 

ccaatgtacg tcacagaaat ggcctttctc tctgacccag cagatggcag cctgtacatc 420 

ttggggaccc aaaaacaaca gggattaatg aaactgccat tcaccatccc tgagctggtt 480 

catgcctctc cctgccgcag ctctgatggg gtcttctaca coggccggaa gcaggatgcc 540 

tggtttgtgg tggaccctga gtcaggggag acccagatga cactgaccac agagggtccc 600 

tccacccccc gcctctacat tggccgaaca cagtatacgg tcaccatgca tgacccaaga 660 

gccccagccc tgcgctggaa caccacctac cgccgctact cagcgccccc catggatggc 720 

tcacctggga aatacatgag ccacctggcg tcctgcggga tgggcctgct gctcactgtg 780 

gacccaggaa gcgggacggt gctgtggaca caggacctgg gcgtgcctgt gatgggcgtc 840 

tacacctggc accaggacgg cctgcgccag ctgccgcatc tcacgctggc tcgagacact 900 

ctgcatttcc tcgccctccg ctggggccac atccgactgc ctgcctcagg cccccgggac 960 

acagccaccc tcttctctac cttggacacc cagctgctaa tgacgctgta tgtggggaag 1020 

gatgaaactg gcttctatgt ctytaaagca ctggtccaca caggagtggc cctggtgcct 1080 



965 



970 
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cgtggactga ccctggcccc cgcagatggc cccaccacag atgaggtgac actccaagtc 1140 
tcaggagagc gagagggctc acccagcact gctgttagat acccctcagg cagtgtggcc 1200 

ctcccaagcc agtggctgct cattggacac cacgagctac ccccagtcct gcacaccacc 1260 

atgctgaggg tccatcccac cctggggagt ggaactgcag agacaagacc tccagagaat 1320 

acccaggccc cagccttctt cttggagcaa cagccgcagg tggtggagaa gcagcaggag 1380 

acccccctgg cacctgcaga ctttgctcac atctcccagg atgcccagtc cctgcactcg 1440 

ggggccagcc ggaggagcca gaagaggctt cagagtccct caaagcaagc ccagccactc 1500 

gacgaccctg aagctgagca actcaccgta gtggggaaga tttccttcaa tcccaaggac 1560 

gtgctgggcc gcggggcagg cgggactttc gttttccggg gacagtttga gggacgggca 1620 

gtggctgtca agcggctcct ccgcgagtgc tttggcctgg ttcggcggga agttcaactg 1680 

ctgcaggagt ctgacaggca ccccaacgtg ctccgctact tctgcaccga gcggggaccc 1740 

cagttccact acattgccct ggagctctgc cgggcctcct tgcaggagta cgtagaaaac 1800 

ccggacctgg atcgcggggg tctggagccc gaggtcgtgc tgcagcagct gatgtctggc 1860 

ctggcccacc tgcactcttt acacatagtg caccgggacc tgaagccagg aaatattctc 1920 

atcaccgggc ctgacagcca gggcctgggc agagtggtgc tctcagactt cggcctctgc 1980 

aagaagctgc ctgctggccg ctgtagcttc agcctccact ccggcatccc cggcacggaa 2040 

ggctggatgg cgcccgagct tctgcagctc ctgccaccag acagtcctac cagcgctgtg 2100 

gacatcttct ctgcaggctg cgtgttctac tacgtgcttt ctggtggcag ccaccccttt 2160 

ggagacagtc tttatcgcca ggcaaacatc ctcacagggg ctccctgtct ggctcacctg 2220 

gaggaagagg tccacgacaa ggtggttgcc cgggacctgg ttggagccat gttgagccca 2280 

ctgccgcagc cacgcccctc tgccccccag gtgctggccc accccttctt ttggagcaga 2340 

gccaagcaac tccagttctt ccaggacgtc agtgactggc tggagaagga gtccgagcag 2400 

gagcccctgg tgagggcact ggaggcggga ggctgcgcag tggtccggga caactggcac 2460 

gagcacatct ccatgccgct gcagacagat ctgagaaagt tccggtccta taaggggaca 2520 

tcagtgcgag acctgctccg tgctgtgagg aacaagaagc accactacag ggagctccca 2580 

gttgaggtgc gacaggcact cggccaagtc cctgatggct tcgtccagta cttcacaaac 2640 

cgcttcccac ggctgctcct ccacacgcac cgagccatga ggagctgcgc ctctgagagc 2700 

ctcttcctgc cctactaccc gccagactca gaggccagga ggccatgccc tggggccaca 2760 

gggaggtga 2769 

<210> SEQ ID NO 9 
<211> LENGTH: 922 
<212> TYPE: PRT 

<213> ORGANISM: homo sapiens > 

<220> FEATURE: 

<221> NAME/KEY: VARIANT 

<222> LOCATION: (1)...(922) 

<223> OTHER INFORMATION: Xaa - Any Amino Acid 
<400> SEQUENCE: 9 

Met Arg Arg Ala Gly lie Gly Glu Asp Ser Arg Leu Gly Leu Gin Ala 
15 10 15 

Gin Pro Gly Ala Glu Pro Ser Pro Gly Arg Ala Gly Thr Glu Arg Ser 



Leu Gly Gly Thr Gin Gly Pro Gly Gin Pro Cya Ser Cys Pro Gly Ala 
35 40 45 
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44 



-continued 



Met Ala Ser Ala Val Arg Gly Ser Arg Pro Trp Pro Arg Leu Gly Leu 
50 55 60 

Gin Leu Gin Phe Ala Ala Leu Leu Leu Gly Thr Leu Ser Pro Gin Val 



Ser Leu Hxb Ala Leu Ser Lys Gin Thr Gly Asp Leu Lys Trp Thr Leu 
100 105 110 

Arg Asp Asp Pro Val lie Glu Gly Pro Met Tyr Val Thr Glu Met Ala 
115 120 125 

Phe Leu Ser Asp Pro Ala Asp Gly Ser Leu Tyr He Leu Gly Thr Gin 
130 135 140 

Lys Gin Gin Gly Leu Met Lys Leu Pro Phe Thr He Pro Glu Leu Val 
145 150 155 160 

His Ala Ser Pro Cys Arg Ser Ser Asp Gly Val Phe Tyr Thr Gly Arg 
165 170 175 

Lys Gin Asp Ala Trp Phe Val Val Asp Pro Glu Ser Gly Glu Thr Gin 
180 185 190 

Met Thr Leu Thr Thr Glu Gly Pro Ser Thr Pro Arg Leu Tyr lie Gly 
195 200 205 

Arg Thr Gin Tyr Thr Val Thr Met His Asp Pro Arg Ala Pro Ala Leu 
210 215 220 

Arg Trp Asn Thr Thr Tyr Arg Arg Tyr Ser Ala Pro Pro Met Asp Gly 
225 230 235 240 

Ser Pro Gly Lys Tyr Met Ser His Leu Ala Ser Cys Gly Met Gly Leu 
245 250 255 

Leu Leu Thr Val Asp Pro Gly Ser Gly Thr Val Leu Trp Thr Gin Asp 
260 265 270 

Leu Gly Val Pro Val Met Gly Val Tyr Thr Trp His Gin Asp Gly Leu 
275 280 285 

Arg Gin Leu Pro His Leu Thr Leu Ala Arg Asp Thr Leu His Phe Leu 
290 295 300 

Ala Leu Arg Trp Gly His He Arg Leu Pro Ala Ser Gly Pro Arg Asp 
305 310 315 320 

Thr Ala Thr Leu Phe Ser Thr Leu Asp Thr Gin Leu Leu Met Thr Leu 
325 330 335 

Tyr Val Gly Lys Asp Glu Thr Gly Phe Tyr Val Xaa Lys Ala Leu Val 
340 345 350 

His Thr Gly Val Ala Leu Val Pro Arg Gly Leu Thr Leu Ala Pro Ala 
355 360 365 

Asp Gly Pro Thr Thr Asp Glu Val Thr Leu Gin Val Ser Gly Glu Arg 
370 375 380 

Glu Gly Ser Pro Ser Thr Ala Val Arg Tyr Pro Ser Gly Ser Val Ala 
385 390 395 400 

Leu Pro Ser Gin Trp Leu Leu He Gly Hia His Glu Leu Pro Pro Val 
405 410 415 

Leu His Thr Thr Met Leu Arg Val His Pro Thr Leu Gly Ser Gly Thr 
420 425 430 

Ala Glu Thr Arg Pro Pro Glu Asn Thr Gin Ala Pro Ala Phe Phe Leu 
435 440 445 

Glu Gin Gin Pro Gin Val Val Glu LyB Gin Gin Glu Thr Pro Leu Ala 
450 455 460 

Pro Ala Asp Phe Ala His He Ser Gin Asp Ala Gin Ser Leu His Ser 



65 



70 



75 



80 



His Thr Leu Arg Pro Glu Asn Leu Leu Leu Val Ser Thr Leu Asp Gly 
85 90 95 
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-continued 



465 



470 



475 



480 



Gly Ala Ser Arg Arg Ser Gin Lys Arg Leu Gin Ser Pro Ser Lys Gin 
485 490 495 

Ala Gin Pro Leu Asp Asp Pro Glu Ala Glu Gin Leu Thr Val Val Gly 
500 505 510 

Lys lie Ser Phe Asn Pro Lys Asp Val Leu Gly Arg Gly Ala Gly Gly 
515 520 525 

Thr Phe Val Phe Arg Gly Gin Phe Glu Gly Arg Ala Val Ala Val Lys 
530 535 540 

Arg Leu Leu Arg Glu Cys Phe Gly Leu Val Arg Arg Glu Val Gin Leu 
545 550 555 560 

Leu Gin Glu Ser Asp Arg His Pro Asn Val Leu Arg Tyr Phe Cys Thr 
565 570 575 

Glu Arg Gly Pro Gin Phe His Tyr He Ala Leu Glu Leu Cys Arg Ala 
580 585 590 

Ser Leu Gin Glu Tyr Val Glu Asn Pro Asp Leu Asp Arg Gly Gly Leu 
595 600 605 

Glu Pro Glu Val Val Leu Gin Gin Leu Met Ser Gly Leu Ala His Leu 
610 615 620 

His Ser Leu His lie Val His Arg Asp Leu Lys Pro Gly Asn He Leu 
625 630 635 640 

He Thr Gly Pro Asp Ser Gin Gly Leu Gly Arg Val Val Leu Ser Asp 
645 650 655 

Phe Gly Leu Cys Lys Lys Leu Pro Ala Gly Arg Cys Ser Phe Ser Leu 
' 660 665 670 

His Ser Gly He Pro Gly Thr Glu Gly Trp Met Ala Pro Glu Leu Leu 
675 680 685 

Gin Leu Leu Pro Pro Asp Ser Pro Thr Ser Ala Val Asp He Phe Ser 
690 695 700 

Ala Gly Cys Val Phe Tyr Tyr Val Leu Ser Gly Gly Ser His Pro Phe 
705 710 715 720 

Gly Asp Ser Leu Tyr Arg Gin Ala Asn He Leu Thr Gly Ala Pro Cys 
725 730 735 

Leu Ala His Leu Glu Glu Glu Val His Asp Lys Val Val Ala Arg Asp 
740 745 750 

Leu Val Gly Ala Met Leu Ser Pro Leu Pro Gin Pro Arg Pro Ser Ala 
755 760 765 

Pro Gin Val Leu Ala His Pro Phe Phe Trp Ser Arg Ala Lys Gin Leu 
770 775 780 

Gin Phe Phe Gin Asp Val Ser Asp Trp Leu Glu Lys Glu Ser Glu Gin 
785 790 795 800 

Glu Pro Leu Val Arg Ala Leu Glu Ala Gly Gly Cys Ala Val Val Arg 



Asp Asn Trp His Glu His He Ser Met Pro Leu Gin Thr Asp Leu Arg 
820 825 830 

Lys Phe Arg Ser Tyr Lys Gly Thr Ser Val Arg Asp Leu Leu Arg Ala 
835 840 845 

Val Arg Asn Lys Lys His His Tyr Arg Glu Leu Pro Val Glu Val Arg 
850 855 860 

Gin Ala Leu Gly Gin Val Pro Asp Gly Phe Val Gin Tyr Phe Thr Asn 
865 870 875 880 

Arg Phe Pro Arg Leu Leu Leu His Thr His Arg Ala Met Arg Ser Cys 



805 



810 



815 



885 



890 



895 
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-continued 



Ala Ser Glu Ser Leu Phe Leu Pro Tyr Tyr Pro Pro Asp Ser Glu Ala 
900 905 910 

Arg Arg Pro Cys Pro Gly Ala Thr Gly Arg 
915 920 . 



<210> SEQ ID NO 10 

<211> LENGTH: 768 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 10 

atgagaaggg cggggatcgg cgaggactcc aggctggggt tgcaggccca gccaggggcg 60 

gagccttctc cgggtcgggc ggggacagag cgctcccttg gaggcaccca gggacctggc 120 

cagccgtgca gctgcccagg cgctatggcg agtgcggtca gggggtcgag gccgtggccc 180 

cggctggggc tccagctcca gttcgcggcg ctgctgctcg ggacgctgag tccacaggtt 240 

catactctca ggccagagaa cctcctgctg gtgtccacct tggatggaag tctccacgca 300 

ctaagcaagc agacagggga cctgaagtgg actctgaggg atgatcccgt catcgaagga 360 

ccaatgtacg tcacagaaat ggcctttctc tctgacccag cagatggcag cctgtacatc 420 

ttggggaccc aaaaacaaca gggattaatg aaactgccat tcaccatccc tgagctggtt 480 

catgcctctc cctgccgcag ctctgatggg gtcttctaca caggccggaa gcaggatgcc 54 0 

tggtttgtgg tggaccctga gtcaggggag acccagatga cactgaccac agagggtccc 600 

tccacccccc gcctctacat tggccgaaca cagtatacgg tcaccatgca tgacccaaga ' 660 

gccccagccc tgcgctggaa caccacctac cgccgctact cagcgccccc catggatggc 720 

tcacctggga aatataaccc tccatgtgat ctccacacac cagactga 768 

<210> SEQ ID NO 11 

<211> LENGTH: 255 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 11 

Met Arg Arg Ala Gly He Gly Glu Asp Ser Arg Leu Gly Leu Gin Ala 
15 10 15 

Gin Pro Gly Ala Glu Pro Ser Pro Gly Arg Ala Gly Thr Glu Arg Ser 
20 25 30 

Leu Gly Gly Thr Gin Gly Pro Gly Gin Pro Cys Ser Cys Pro Gly Ala 
35 40 45 

Met Ala Ser Ala Val Arg Gly Ser Arg Pro Trp Pro Arg Leu Gly Leu 
.50 55 60 

Gin Leu Gin Phe Ala Ala Leu Leu Leu Gly Thr Leu Ser Pro Gin Val 

65 70 75 80 >v 

His Thr Leu Arg Pro Glu Asn Leu Leu Leu Val Ser Thr Leu Asp Gly 
85 90 95 

Ser Leu His Ala Leu Ser Lys Gin Thr Gly Asp Leu Lys Trp Thr Leu 
100 105 110 

Arg Asp Asp Pro Val He Glu Gly Pro Met Tyr Val Thr Glu Met Ala 
115 120 125 

Phe Leu Ser Asp Pro Ala Asp Gly Ser Leu Tyr He Leu Gly Thr Gin 
130 135 140 

Lys Gin Gin Gly Leu Met Lys Leu Pro Phe Thr He Pro Glu Leu Val 
145 150 155 160 
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His Ala Ser Pro Cys Arg Ser Ser Asp Gly Val Phe Tyr Thr Gly Arg 
165 170 175 

Lys Gin Asp Ala Trp Phe Val Val Asp Pro Glu Ser Gly Glu Thr Gin 
180 185 190 

Met Thr Leu Thr Thr Glu Gly Pro Ser Thr Pro Arg Leu Tyr lie Gly 
195 200 205 

Arg Thr Gin Tyr Thr Val Thr Met His Asp Pro Arg Ala Pro Ala Leu 
210 215 220 

Arg Trp Asn Thr Thr Tyr Arg Arg Tyr Ser Ala Pro Pro Met Asp Gly 
225 230 235 240 

Ser Pro Gly Lys Tyr Asn Pro Pro Cys Asp Leu His Thr Pro Asp 
245 250 255 



What is claimed is: 20 3. An isolated nucleic acid molecule comprising the 

1. An isolated nucleic acid molecule comprising the nucleotide sequence encoding the amino acid sequence of 
nucleotide sequence described in SEQ ID NO: 6. ggQ j^q. 7 

2. An isolated nucleic acid molecule comprising a nucle- 
otide sequence that: 4. A recombinant expression vector comprising the iso- 

(a) encodes the amino acid sequence shown in SEQ ID 2 5 Iated nucleic acid molecule of claim 3 - 

NO: 7; and 5. A host cell comprising the recombinant expression 

(b) hybridizes under stringent conditions with washing in vector of claim 4. 
O.lxSSC/0.1% SDS at 68° C. to the nucleotide 

sequence of SEQ ID NO: 6 or the complement thereof. ***** 
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HUMAN KINASES AND 
POLYNUCLEOTIDES ENCODING THE 
SAME 

The present application claims the benefit of U.S. Pro- 
visional Application No. 60/239,821, which was filed on 
Oct. 12, 2000, and is herein incorporated by reference in its 
entirety. 

1. INTRODUCTION 

The present invention relates to the discovery, 
identification, and characterization of novel human poly- 
nucleotides encoding proteins sharing sequence similarity 
with animal kinases. The invention encompasses the 
described polynucleotides, host cell expression systems, the 
encoded proteins, fusion proteins, polypeptides and 
peptides, antibodies to the encoded proteins and peptides, 
and genetically engineered animals that either lack or over 
express the disclosed genes, antagonists and agonists of the 
proteins, and other compounds that modulate the expression 
or activity of the proteins encoded by the disclosed genes 
that can be used for diagnosis, drug screening, clinical trial 
monitoring, the treatment of diseases and disorders, and 
cosmetic or nutriceutical applications. 

2. BACKGROUND OF THE INVENTION 

Kinases mediate the phosphorylation of a wide variety of 
proteins and compounds in the cell. Along with 
phosphatases, kinases are involved in a range of regulatory 
pathways. Given the physiological importance of kinases, 
they have been subject to intense scrutiny and are proven 
drug targets. 

3. SUMMARY OF THE INVENTION 

The present invention relates to the discovery, 
identification, and characterization of nucleotides that 
encode novel human proteins and the corresponding amino 
acid sequences of these proteins. The novel human proteins 
(NHPs) described for the first time herein share structural 
similarity with animal kinases, including, but not limited to, 
serine-threonine kinases, calcium/calmodulin-dependent 
protein kinases, and mitogen activated kinases. Accordingly, 
the described NHPs encode novel kinases having homo- 
logues and orthologs across a range of phyla and species. 

The novel human polynucleotides described herein, 
encode open reading frames (ORFs) encoding proteins of 
766 and 765 amino acids in length (see respectively SEQ ID 
NOS: 2 and 4). 

The invention also encompasses agonists and antagonists 
of the described NHPs, including small molecules, large 
molecules, mutant NHPs, or portions thereof, that compete 
with native NHP, peptides, and antibodies, as well as nucle- 
otide sequences that can be used to inhibit the expression of 
the described NHPs (e.g., antisense and ribozyme 
molecules, and open reading frame or regulatory sequence 
replacement constructs) or to enhance the expression of the 
described NHPs (e.g., expression constructs that place the 
described polynucleotide under the control of a strong 
promoter system), and transgenic animals that express a 
NHP sequence, or "knock-outs" (which can be conditional) 
that do not express a functional NHP. Knock-out mice can 
be produced in several ways, one of which involves the use 
of mouse embryonic stem cells ("ES cells") lines that 
contain gene trap mutations in a murine homolog of at least 
one of the described NHPs. When the unique NHP 
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sequences described in SEQ ID NOS: 1-4 are "knocked-out" 
they provide a method of identifying phenotypic expression 
of the particular gene as well as a method of assigning 
function to previously unknown genes. In addition, animals 

5 in which the unique NHP sequences described in SEQ ID 
NOS: 1-4 are "knocked -out" provide a unique source in 
which to elicit antibodies to homologous and orthologous 
proteins that would have been previously viewed by the 
immune system as "self' and therefore would have failed to 

10 elicit significant antibody responses. To these ends, gene 
trapped knockout ES cells have been generated in murine 
homologs of the described NHPs. 

Additionally, the unique NHP sequences described in 
SEQ ID N0S:l-4 are useful for the identification of protein 

35 coding sequence and mapping a unique gene to a particular 
chromosome. These sequences identify biologically verified 
exon splice junctions as opposed to splice junctions that may 
have been bioinformatically predicted from genomic 
sequence alone. The sequences of the present invention are 

20 also useful as additional DNA markers for restriction frag- 
ment length polymorphism (RFLP) analysis, and in forensic 
biology. 

Further, the present invention also relates to processes for 
identifying compounds that modulate, i.e., act as agonists or 

25 antagonists, of NHP expression and/or NHP activity that 
utilize purified preparations of the described NHPs and/or 
NHP product, or cells expressing the same. Such compounds 
can be used as therapeutic agents for the treatment of any of 
a wide variety of symptoms associated with biological 

30 disorders or imbalances. 

4. DESCRIPTION OF THE SEQUENCE LISTING 
AND FIGURES 

35 The Sequence Listing provides the sequence of the novel 
human ORFs encoding the described novel human kinase 
proteins. 

5. DETAILED DESCRIPTION OF THE 
INVENTION 

40 

The NHPs described for the first time herein are novel 
proteins that are expressed in, inter alia, human cell lines and 
human fetal brain, brain, pituitary, spinal cord, testis, 
adipose, and esophagus cells. The described sequences were 

45 compiled from sequences available in GENBANK, and 
cDNAs generated from skeletal muscle, adipose, pituitary, 
cerebellum, and brain mRNA (Edge Biosystems, 
Gaithersburg, Md.). 
The present invention encompasses the nucleotides pre- 

50 sented in the Sequence Listing, host cells expressing such 
nucleotides, the expression products of such nucleotides, 
and: (a) nucleotides that encode mammalian homologs of 
the described genes, including the specifically described 
NHPS, and the NHP products; (b) nucleotides that encode 

55 one or more portions of an NHP that correspond to func- 
tional domains, and the polypeptide products specified by 
such nucleotide sequences, including but not limited to the 
novel regions of any active domain(s); (c) isolated nucle- 
otides that encode mutant versions, engineered or naturally 

60 occurring, of the described NHPs in which all or a part of at 
least one domain is deleted or altered, and the polypeptide 
products specified by such nucleotide sequences, including 
but not limited to soluble proteins and peptides in which all 
or a portion of the signal sequence is deleted; (d) nucleotides 

65 that encode chimeric fusion proteins containing all or a 
portion of a coding region of a NHP, or one of its domains 
(e.g., a receptor/ligand binding domain, accessory protein/ 
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self-association domain, etc.) fused to another peptide or addressable arrays (i.e., gene chips, microtiter plates, etc.) of 
polypeptide; or (e) therapeutic or diagnostic derivatives of oligonucleotides and polynucleotides, or corresponding oli- 
the described polynucleotides such as oligonucleotides, anti- gopeptides and polypeptides, wherein at least one of the 
sense polynucleotides, ribozymes, dsRNA, or gene therapy biopolymers present on the spatially addressable array corn- 
constructs comprising a sequence first disclosed in the 5 prises an oligonucleotide or polynucleotide sequence first 
Sequence Listing. As discussed above, the present invention disclosed in at least one of the sequences of SEQ ID NOS: 
includes: (a) the human DNA sequences presented in the 1-4, or an amino acid sequence encoded thereby. Methods 
Sequence Listing (and vectors comprising the same) and for attaching biopolymers to, or synthesizing biopolymers 
additionally contemplates any nucleotide sequence encoding on, solid support matrices, and conducting binding studies 
a contiguous NHP open reading frame (ORF) that hybridizes 10 thereon are disclosed in, inter alia, U.S. Pat. Nos. 5,700,637, 
to a complement of a DNA sequence presented in the 5,556,752, 5,744,305, 4,631,211, 5,445,934, 5,252,743, 
Sequence Listing under highly stringent conditions, e.g., 4,713,326, 5,424,186, and 4,689,405 the disclosures of 
hybridization to filter-bound DNA in 0.5 M NaHP0 4 , 1% which are herein incorporated by reference in their entirety, 
sodium dodecyl sulfate (SDS), 1 mM EDTA at 65° C, and Addressable arrays comprising sequences first disclosed 
washing in 0.1xSSC/0.1% SDS at 68° C. (Ausubel et al., 15 in SEQ ID NOS: 1-4 can be used to identify and characterize 
eds., 1989, Current Protocols in Molecular Biology, Vol. I, the temporal and tissue specific expression of a gene. These 
Green Publishing Associates, Inc., and John Wiley & sons, addressable arrays incorporate oligonucleotide sequences of 
Inc., New York, at p. 2.10,3) and encodes a functionally sufficient length to confer the required specificity, yet be 
equivalent expression product. Additionally, contemplated within the limitations of the production technology. The 
are any nucleotide sequences that hybridize to the comple- 2Q length of these probes is within a range of between about 8 
ment of the DNA sequence that encode and express an to about 2000 nucleotides. Preferably the probes consist of 
amino acid sequence presented in the Sequence Listing 60 nucleotides and more preferably 25 nucleotides from the 
under moderately stringent conditions, e.g., washing in sequences first disclosed in SEQ ID NOS:l-4. 
0.2xSSC/0.1% SDS at 42° C. (Ausubel et al., 1989, supra), For example, a series of the described oligonucleotide 
yet still encode a functionally equivalent NHP product. 25 sequences, or the complements thereof, can be used in chip 
Functional equivalents of a NHP include naturally occurring format to represent all or a portion of the described 
NHPs present in other species and mutant NHPs whether sequences. The oligonucleotides, typically between about 16 
naturally occurring or engineered (by site directed to about 40 (or any whole number within the stated range) 
mutagenesis, gene shuffling, directed evolution as described nucleotides in length can partially overlap each other and/or 
in, for example, U.S. Pat. Nos. 5,837,458 or 5,723,323, both 30 the sequence may be represented using oligonucleotides that 
of which are herein incorporated by reference). The inven- do not overlap. Accordingly, the described polynucleotide 
tion also includes degenerate nucleic acid variants of the sequences shall typically comprise at least about two or 
disclosed NHP polynucleotide sequences. three distinct oligonucleotide sequences of at least about 8 
Additionally contemplated are polynucleotides encoding nucleotides in length that are each first disclosed in the 
NHP ORFs, or their functional equivalents, encoded by 35 described Sequence Listing. Such oligonucleotide 
polynucleotide sequences that are about 99, 95, 90, or about sequences can begin at any nucleotide present within a 
85 percent similar to corresponding regions of SEQ ID NO :1 sequence in the Sequence Listing and proceed in either a 
(as measured by BLAST sequence comparison analysis sense (5'-to-3') orientation vis-a-vis the described sequence 
using, for example, the GCG sequence analysis package or in an antisense orientation. 

using default parameters). 40 Microarray-based analysis allows the discovery of broad 

The invention also includes nucleic acid molecules, pref- patterns of genetic activity, providing new understanding of 

erably DNA molecules, that hybridize to, and are therefore gene functions and generating novel and unexpected insight 

the complements of, the described NHP encoding polynucle- into transcriptional processes and biological mechanisms, 

otides. Such hybridization conditions can be highly stringent The use of addressable arrays comprising sequences first 

or less highly stringent, as described above. In instances 45 disclosed in SEQ ID NOS: 1-4 provides detailed information 

where the nucleic acid molecules are deoxyoligonucleotides about transcriptional changes involved in a specific pathway, 

("DNA oligos"), such molecules are generally about 16 to potentially leading to the identification of novel components 

about 100 bases long, or about 20 to about 80, or about 34 or gene functions that manifest themselves as novel pheno- _ 

to about 45 bases long, or any variation or combination of types. 

sizes represented therein that incorporate a contiguous 50 Probes consisting of sequences first disclosed in SEQ ID 

region of sequence first disclosed in the Sequence Listing. NOS:l -4canalsobe used in the identification, selection and 

Such oligonucleotides can be used in conjunction with the validation of novel molecular targets for drug discovery. The 

polymerase chain reaction (PCR) to screen libraries, isolate use of these unique sequences permits the direct confirma- 

clones, and prepare cloning and sequencing templates, etc. tion of drug targets and recognition of drug dependent 

Alternatively, such NHP oligonucleotides can be used as 55 changes in gene expression that are modulated through 

hybridization probes for screening libraries, and assessing pathways distinct from the drugs intended target. These 

gene expression patterns (particularly using a micro array or unique sequences therefore also have utility in defining and 

high-throughput "chip" format). Additionally, a series of the monitoring both drug action and toxicity, 

described NHP oligonucleotide sequences, or the comple- As an example of utility, the sequences first disclosed in 

ments thereof, can be used to represent all or a portion of the 60 SEQ ID NOS: 1-4 can be utilized in microarrays or other 

described NHP sequences. An oligonucleotide or polynucle- assay formats, to screen collections of genetic material from 

otide sequence first disclosed in at least a portion of one or patients who have a particular medical condition. These 

more of the sequences of SEQ ID NOS: 1-4 can be used as investigations can also be carried out using the sequences 

a hybridization probe in conjunction with a solid support first disclosed in SEQ ID NOS: 1-4 in silico and by com- 

matrix/substrate (resins, beads, membranes, plastics, 65 paring previously collected genetic databases and the dis- 

polymers, metal or metallized substrates, crystalline or poly- closed sequences using computer software known to those in 

crystalline substrates, etc.) . Of particular note are spatially the art. 
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Thus the sequences first disclosed in SEQ ID NOS:l-4 
can be used to identify mutations associated with a particular 
disease and also as a diagnostic or prognostic assay. 

Although the presently described sequences have been 
specifically described using nucleotide sequence, it should 
be appreciated that each of the sequences can uniquely be 
described using any of a wide variety of additional structural 
attributes, or combinations thereof. For example, a given 
sequence can be described by the net composition of the 
nucleotides present within a given region of the sequence in 
conjunction with the presence of one or more specific 
oligonucleotide sequence(s) first disclosed in the SEQ ID 
NOS: 1-4. Alternatively, a restriction map specifying the 
relative positions of restriction endonuclease digestion sites, 
or various palindromic or other specific oligonucleotide 
sequences can be used to structurally describe a given 
sequence. Such restriction maps, which are typically gener- 
ated by widely available computer programs (e.g., the Uni- 
versity of Wisconsin GCG sequence analysis package, 
SEQUENCHER 3.0, Gene Codes Corp., Ann Arbor, Mich., 
etc.), can optionally be used in conjunction with one or more 
discrete nucleotide sequence(s) present in the sequence that 
can be described by the relative position of the sequence 
relative to one or more additional sequence(s) or one or more 
restriction sites present in the disclosed sequence. 

For oligonucleotide probes, highly stringent conditions 
may refer, e.g., to washing in 6xSSC/0.05% sodium pyro- 
phosphate at 37° C. (for 14-base oligos), 48° C. (for 17-base 
oligos), 55° C. (for 20-base oligos), and 60° C. (for 23-base 
oligos). These nucleic acid molecules may encode or act as 
NHP gene antisense molecules, useful, for example, in NHP 
gene regulation (for and/or as antisense primers in amplifi- 
cation reactions of NHP gene nucleic acid sequences). With 
respect to NHP gene regulation, such techniques can be used 
to regulate biological functions. Further, such sequences can 
be used as part of ribozyme and/or triple helix sequences that 
are also useful for NHP gene regulation. 

Inhibitory antisense or double stranded oligonucleotides 
can additionally comprise at least one modified base moiety 
that is selected from the group including but not limited to 
5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, 
hypoxanthine , xanthine, 4-acetylcy tosine, 
5-(carboxyhydroxylmethyl) uracil, 
5-carboxymethylaminomethyl-2-thiouridine, 
5-carboxymethylaminomethyluracil, dihydrouracil, beta-D- 
galactosylqueosine, inosine, N6-isopentenyladenine, 

1- methylguanine, 1-methylinosine, 2,2-dimethylguanine, 

2- methyladenine, 2-methylguanine, 3-methylcytosine, 
5-methylcytosine, N6-adenine, 7-methylguanine, 
5-methylaminomethyluracil, 5-methoxyaminomethyl-2- 
thiouracil, beta-D-mannosylqueosine, 
5'-methoxycarboxymethyluracil, 5-methoxyuracil, 
2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic 
acid (v), wybutoxosine, pseudouracil, queosine, 
2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 
4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid 
methylester, uracil-5-oxyacetic acid (v), 5-methyl-2- 
thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3) 
w, and 2,6-diaminopurine. 

The antisense oligonucleotide can also comprise at least 
one modified sugar moiety selected from the group includ- 
ing but not limited to arabinose, 2-fluoroarabinose, xylulose, 
and hexose. 

In yet another embodiment, the antisense oligonucleotide 
will comprise at least one modified phosphate backbone 
selected from the group consisting of a phosphorothioate, a 



phosphorodithioate, a phosphoramidothioate, a 
phosphoramidate, a phosphordiamidate, a 
methylphosphonate, an alkyl phosphotriester, and a formac- 
etal or analog thereof. 

5 In yet another embodiment, the antisense oligonucleotide 
is an a-anomeric oligonucleotide. An a-anomeric oligo- 
nucleotide forms specific double -stranded hybrids with 
complementary RNAin which, contrary to the usual p-units, 
the strands run parallel to each other (Gautier et al., 1987, 

10 Nucl. Acids Res. 15:6625-6641). The oligonucleotide is a 
2'-0-methylribonucleotide (Inoue et al., 1987, Nucl. Acids 
Res. 15:6131-6148), or a chimeric RNA-DNA analogue 
(Inoue et al., 1987, FEBS Lett. 215:327-330). Alternatively, 
double stranded RNA can be used to disrupt the expression 

15 and function of a targeted NHP. 

Oligonucleotides of the invention can be synthesized by 
standard methods known in the art, e.g., by use of an 
automated DNA synthesizer (such as are commercially 
available from Biosearch, Applied Biosystems, etc.). As - 

20 examples, phosphorothioate oligonucleotides can be synthe- 
sized by the method of Stein et al. (1988, Nucl. Acids Res. 
16:3209), and methylphosphonate oligonucleotides can be 
prepared by use of controlled pore glass polymer supports 
(Sarin et al., 1988, Proc. Natl. Acad. Sci. U.S.A. 

25 85:7448-7451), etc. 

Low stringency conditions are well known to those of 
skill in the art, and will vary predictably depending on the 
specific organisms from which the library and the labeled 

3Q sequences are derived. For guidance regarding such condi- 
tions see, for example, Sambrook et al., 1989, Molecular 
Cloning, A Laboratory Manual (and periodic updates 
thereof), Cold Springs Harbor Press, N.Y.; and Ausubel et 
al., 1989, supra. 

3S Alternatively, suitably labeled NHP nucleotide probes can 
be used to screen a human genomic library using appropri- 
ately stringent conditions or by PCR. The identification and 
characterization of human genomic clones is helpful for 
identifying polymorphisms (including, but not limited to, 

40 nucleotide repeats, microsatellite alleles, single nucleotide 
polymorphisms, or coding single nucleotide 
polymorphisms), determining the genomic structure of a 
given locus/allele, and designing diagnostic tests. For 
example, sequences derived from regions adjacent to the 

4S intron/exon boundaries of the human gene can be used to 
design primers for use in amplification assays to detect 
mutations within the exons, introns, splice sites (e.g., splice 
acceptor and/or donor sites), etc., that can be used in 
diagnostics and pharmacogenomics. 

50 For example, the present sequences can be used in restric- 
tion fragment length polymorphism (RFLP) analysis to 
identify specific individuals. In this technique, an individu- 
al's genomic DNA is digested with one or more restriction 
enzymes, and probed on a Southern blot to yield unique 

55 bands for identification (as generally described in U.S. Pat. 
No. 5,272,057, incorporated herein by reference). In 
addition, the sequences of the present invention can be used 
to provide polynucleotide reagents, e.g., PCR primers, tar- 
geted to specific loci in the human genome, which can 

60 enhance the reliability of DNA-based forensic identifica- 
tions by, for example, providing another "identification 
marker" (i.e., another DNA sequence that is unique to a 
particular individual). Actual base sequence information can 
be used for identification as an accurate alternative to 

65 patterns formed by restriction enzyme generated fragments. 
Further, a NHP gene homolog can be isolated from 
nucleic acid from an organism of interest by performing 
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PCR using two degenerate or "wobble" oligonucleotide 
primer pools designed on the basis of amino acid sequences 
within the NHP products disclosed herein. The template for 
the reaction may be total RNA, mRNA, and/or cDNA 
obtained by reverse transcription of mRNA prepared from, 
for example, human or non-human cell lines or tissue known 
or suspected to express an allele of a NHP gene. 

The PCR product can be subcloned and sequenced to 
ensure that the amplified sequences represent the sequence 
of the desired NHP gene. The PCR fragment can then be 
used to isolate a full length cDNA clone by a variety of 
methods. For example, the amplified fragment can be 
labeled and used to screen a cDNA library, such as a 
bacteriophage cDNA library. Alternatively, the labeled frag- 
ment can be used to isolate genomic clones via the screening 
of a genomic library. 

PCR technology can also be used to isolate full length 
cDNA sequences. For example, RNA can be isolated, fol- 
lowing standard procedures, from an appropriate cellular or 
tissue source (i.e., one known, or suspected, to express a 
NHP gene). A reverse transcription (RT) reaction can be 
performed on the RNA using an oligonucleotide primer 
specific for the most 5' end of the amplified fragment for the 
priming of first strand synthesis. The resulting RNA/DNA 
hybrid may then be "tailed" using a standard terminal 
transferase reaction, the hybrid may be digested with RNase 
H, and second strand synthesis may then be primed with a 
complementary primer. Thus, cDNA sequences upstream of 
the amplified fragment can be isolated. For a review of 
cloning strategies that can be used, see e.g., Sambrook et al., 
1989, supra. 

A cDNA encoding a mutant NHP sequence can be 
isolated, for example, by using PCR. In this case, the first 
cDNA strand may be synthesized by hybridizing an oligo-dT 
oligonucleotide to MRNA isolated from tissue known or 
suspected to be expressed in an individual putatively carry- 
ing a mutant NHP allele, and by extending the new strand 
with reverse transcriptase. The second strand of the cDNAis 
then synthesized using an oligonucleotide that hybridizes 
specifically to the 5' end of the normal sequence. Using these 
two primers, the product is then amplified via PCR, option- 
ally cloned into a suitable vector, and subjected to DNA 
sequence analysis through methods well known to those of 
skill in the art. By comparing the DNA sequence of the 
mutant NHP allele to that of a corresponding normal NHP 
allele, the mutation(s) responsible for the loss or alteration 
of function of the mutant NHP gene product can be ascer- 
tained. 

Alternatively, a genomic library can be constructed using 
DNA obtained from an individual suspected of or known to 
carry a mutant NHP allele (e.g., a person manifesting a 
NHP-associated phenotype such as, for example, immune 
disorders, obesity, high blood pressure, etc.), or a cDNA 
library can be constructed using RNA from a tissue known, 
or suspected, to express a mutant NHP allele. A normal NHP 
gene, or any suitable fragment thereof, can then be labeled 
and used as a probe to identify the corresponding mutant 
NHP allele in such libraries. Clones containing mutant NHP 
sequences can then be purified and subjected to sequence 
analysis according to methods well known to those skilled in 
the art. 

Additionally, an expression library can be constructed 
utilizing cDNA synthesized from, for example, RNA iso- 
lated from a tissue known, or suspected, to express a mutant 
NHP allele in an individual suspected of or known to carry 
such a mutant allele. In this manner, gene products made by 
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the putatively mutant tissue may be expressed and screened 
using standard antibody screening techniques in conjunction 
with antibodies raised against a normal NHP product, as 
described below. (For screening techniques, see, for 
5 example, Harlow, E. and Lane, eds., 1988, "Antibodies: A 
Laboratory Manual", Cold Spring Harbor Press, Cold Spring 
Harbor.) 

Additionally, screening can be accomplished by screening 
with labeled NHP fusion proteins, such as, for example, 

]Q alkaline phosphatase-NHP or NHP-alkaline phosphatase 
fusion proteins. In cases where a NHP mutation results in an 
expression product with altered function (e.g., as a result of 
a missense or a frameshift mutation), polyclonal antibodies 
to NHP are likely to cross-react with a corresponding mutant 
NHP expression product. Library clones detected via their 

35 reaction with such labeled antibodies can be purified and 
subjected to sequence analysis according to methods well 
known in the art. 

An additional application of the described novel human 
polynucleotide sequences is their use in the molecular 

20 mutagenesis/evolution of proteins that are at least partially 
encoded by the described novel sequences using, for 
example, polynucleotide shuffling or related methodologies. 
Such approaches are described in U.S. Pat. Nos. 5,830,721, 
5,837,458, 6,117,679, and 5,723,323, which are herein 

25 incorporated by reference in their entirety. 

The invention also encompasses (a) DNA vectors that 
contain any of the foregoing NHP coding sequences and/or 
their complements (i.e., antisense); (b) DNA expression 
vectors that contain any of the foregoing NHP coding 

30 sequences operatively associated with a regulatory element 
that directs the expression of the coding sequences (for 
example, baculo virus as described in U.S. Pat. No. 5,869, 
336 herein incorporated by reference); (c) genetically engi- 
neered host cells that contain any of the foregoing NHP 

35 coding sequences operatively associated with a regulatory 
element that directs the expression of the coding sequences 
in the host cell; and (d) genetically engineered host cells that 
express an endogenous NHP sequence under the control of 
an exogenously introduced regulatory element (i.e., gene 
activation). As used herein, regulatory elements include, but 

40 are not limited to, inducible and non-inducible promoters, 
enhancers, operators and other elements known to those 
skilled in the art that drive and regulate expression. Such 
regulatory elements include but are not limited to the 
cytomegalovirus (hCMV) immediate early gene, 

45 regulatable, viral elements (particularly retroviral LTR 
promoters), the early or late promoters of S V40 adenovirus, 
the lac system, the trp system, the TAC system, the TRC 
system, the major operator and promoter regions of phage 
lambda, the control regions of fd coat protein, the promoter 

50 for 3-phosphoglycerate kinase (PGK), the promoters of acid 
phosphatase, and the promoters of the yeast a-mating fac- 
tors. 

Where, as in the present instance, some of the described 
NHP peptides or polypeptides are thought to be cytoplasmic 

55 or nuclear proteins (although processed forms or fragments 
can be secreted or membrane associated), expression sys- 
tems can be engineered that produce soluble derivatives of 
a NHP (corresponding to a NHP extracellular and/or intra- 
cellular domains, or truncated polypeptides lacking one or 

60 more hydrophobic domains) and/or NHP fusion protein 
products (especially NHP-Ig fusion proteins, i.e., fusions of 
a NHP domain to an IgFc), NHP antibodies, and anti- 
idiotypic antibodies (including Fab fragments) that can be 
used in therapeutic applications. Preferably, the above 

65 expression systems are engineered to allow the desired 
peptide or polypeptide to be recovered from the culture 
media. 
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The present invention also encompasses antibodies and 
anti-idiotypic antibodies (including Fab fragments), antago- 
nists and agonists of a NHP, as well as compounds or 
nucleotide constructs that inhibit expression of a NHP 
sequence (transcription factor inhibitors, antisense and 5 
ribozyme molecules, or open reading frame sequence or 
regulatory sequence replacement constructs), or promote the 
expression of a NHP (e.g., expression constructs in which 
NHP coding sequences are operatively associated with 
expression control elements such as promoters, promoter/ 10 
enhancers, etc.). 

The NHPs or NHP peptides, NHP fusion proteins, NHP 
nucleotide sequences, antibodies, antagonists and agonists 
can be useful for the detection of mutant NHPs or inappro- 
priately expressed NHPs for the diagnosis of disease. The 15 
NHP proteins or peptides, NHP fusion proteins, NHP nucle- 
otide sequences, host cell expression systems, antibodies, 
antagonists, agonists and genetically engineered cells and 
animals can be used for screening for drugs (or high 
throughput screening of combinatorial libraries) effective in 2 o 
the treatment of the symptomatic or phenotypic manifesta- 
tions of perturbing the normal function of a NHP in the body. 
The use of engineered host cells and/or animals can offer an 
advantage in that such systems allow not only for the 
identification of compounds that bind to the endogenous 25 
receptor/ligand of a NHP, but can also identify compounds 
that trigger NHP-mediated activities or pathways. 

Finally, the NHP products can be used as therapeutics. For 
example, soluble derivatives such as NHP peptides/domains 
corresponding to NHPs, NHP fusion protein products 30 
(especially NHP-Ig fusion proteins, i.e., fusions of a NHP, or 
a domain of a NHP, to an IgFc), NHP antibodies and 
anti-idiotypic antibodies (including Fab fragments), antago- 
nists or agonists (including compounds that modulate or act 
on downstream targets in a NHP-mediated pathway) can be 35 
used to directly treat diseases or disorders. For instance, the 
administration of an effective amount of soluble NHP, or a 
NHP-IgFc fusion protein or an anti-idiotypic antibody (or its 
Fab) that mimics the NHP could activate or effectively 
antagonize the endogenous NHP or a protein interactive 40 
therewith. Nucleotide constructs encoding such NHP prod- 
ucts can be used to genetically engineer host cells to express 
such products in vivo; these genetically engineered cells 
function as "bioreactors" in the body delivering a continuous 
supply of a NHP, a NHP peptide, or a NHP fusion protein to 45 
the body. Nucleotide constructs encoding functional NHPs, 
mutant NHPs, as well as antisense and ribozyme molecules 
can also be used in "gene therapy*' approaches for the 
modulation of NHP expression. Thus, the invention also 
encompasses pharmaceutical formulations and methods for 50 
treating biological disorders. 

Various aspects of the invention are described in greater 
detail in the subsections below. 

5.1 THE NHP SEQUENCES 55 

The cDNA sequences and corresponding deduced amino 
acid sequences of the described NHPs are presented in the 
Sequence Listing. 

Expression analysis has provided evidence that the 
described NHPs can be expressed in a relatively narrow 60 
range of human tissues. In addition to serine-threonine 
kinases, the described NHPs also share significant similarity 
to a range of additional kinase families, including kinases 
associated with signal transduction, from a variety of phyla 
and species. 65 

An additional application of the described novel human 
polynucleotide sequences is their use in the molecular 



mutagenesis/evolution of proteins that are at least partially 
encoded by the described novel sequences using, for 
example, polynucleotide shuffling or related methodologies. 
Such approaches are described in U.S. Pat. Nos. 5,830,721 
and 5,837,458, which are herein incorporated by reference in 
their entirety. 

NHP gene products can also be expressed in transgenic 
animals. Animals of any species, including, but not limited 
to, worms, mice, rats, rabbits, guinea pigs, pigs, micro-pigs, 
birds, goats, and non-human primates, e.g., baboons, 
monkeys, and chimpanzees may be used to generate NHP 
transgenic animals. 

Any technique known in the art may be used to introduce 
a NHP transgene into animals to produce the founder lines 
of transgenic animals. Such techniques include, but are not 
limited to pronuclear microinjection (Hoppe and Wagner, 
1989, U.S. Pat. No. 4,873,191); retrovirus mediated gene 
transfer into germ lines (Van der Putten et al., 1985, Proc. 
Natl. Acad. Sci., USA 82:6148-6152); gene targeting in 
embryonic stem cells (Thompson et al., 1989, Cell 
56:313-321); electroporation of embryos (Lo, 1983, Mol 
Cell. Biol. 3:1803-1814); and sperm-mediated gene transfer 
(Lavitrano et al, 1989, Cell 57:717-723); etc. For a review 
of such techniques, see Gordon, 1989, Transgenic Animals, 
Intl. Rev. Cytol. 115:171-229, which is incorporated by 
reference herein in its entirety. 

The present invention provides for transgenic animals that 
carry the NHP transgene in all their cells, as well as animals 
that carry the transgene in some, but not all their cells, i.e., 
mosaic animals or somatic cell transgenic animals. The 
transgene may be integrated as a single transgene or in 
concatamers, e.g., head-to-head tandems or head -to-tail tan- 
dems. The transgene may also be selectively introduced into 
and activated in a particular cell type by following, for 
example, the teaching of Lasko et al., 1992, Proc. Natl. 
Acad. Sci. USA 89:6232-6236. The regulatory sequences 
required for such a cell-type specific activation will depend 
upon the particular cell type of interest, and will be apparent 
to those of skill in the art. 

When it is desired that a NHP transgene be integrated into 
the chromosomal site of the endogenous NHP gene, gene 
targeting is preferred. Briefly, when such a technique is to be 
utilized, vectors containing some nucleotide sequences 
homologous to the endogenous NHP gene are designed for 
the purpose of integrating, via homologous recombination 
with chromosomal sequences, into and disrupting the func- 
tion of the nucleotide sequence of the endogenous NHP gene 
(i.e., "knockout" animals). 

The transgene can also be selectively introduced into a 
particular cell type, thus inactivating the endogenous NHP 
gene in only that cell type, by following, for example, the 
teaching of Gu et al, 1994, Science, 265:103-106. The 
regulatory sequences required for such a cell -type specific 
inactivation will depend upon the particular cell type of 
interest, and will be apparent to those of skill in the art. 

Once transgenic animals have been generated, the expres- 
sion of the recombinant NHP gene may be assayed utilizing 
standard techniques. Initial screening may be accomplished 
by Southern blot analysis or PCR techniques to analyze 
animal tissues to assay whether integration of the transgene 
has taken place. The level of mRNA expression of the 
transgene in the tissues of the transgenic animals may also 
be assessed using techniques that include but are not limited 
to Northern blot analysis of tissue samples obtained from the 
animal, in situ hybridization analysis, and RT-PCR. Samples 
of NHP gene-expressing tissue, may also be evaluated 
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immunocytochemically using antibodies specific for the 
NHP transgene product. 

5.2 NHPS AND NHP POLYPEPTIDES 

NHPS, NHP polypeptides, NHP peptide fragments, 
mutated, truncated, or deleted forms of the NHPs, and/or 
NHP fusion proteins can be prepared for a variety of uses. 
These uses include, but are not limited to, the generation of 
antibodies, as reagents in diagnostic assays, for the identi- 
fication of other cellular gene products related to a NHP, as 
reagents in assays for screening for compounds that can be 
used as pharmaceutical reagents useful in the therapeutic 
treatment of mental, biological, or medical disorders and 
disease. Given the similarity information and expression 
data, the described NHPs can be targeted (by drugs, oligos, 
antibodies, etc.) in order to treat disease, or to therapeuti- 
cally augment the efficacy of therapeutic agents. 

The Sequence Listing discloses the amino acid sequences 
encoded by the described NHP-encoding polynucleotides. 
The NHPs display initiator methionines that are present in 
DNA sequence contexts consistent with eucaryotic transla- 
tion initiation sites. The NHPs do not display consensus 
signal sequences, which indicates that they may be cyto- 
plasmic or possibly nuclear proteins, although they may also 
be secreted or membrane associated. 

The NHP amino acid sequences of the invention include 
the amino acid sequences presented in the Sequence Listing 
as well as analogues and derivatives thereof. Further, cor- 
responding NHP homologues from other species are encom- 
passed by the invention. In fact, any NHP protein encoded 
by the NHP nucleotide sequences described above are within 
the scope of the invention, as are any novel polynucleotide 
sequences encoding all or any novel portion of an amino 
acid sequence presented in the Sequence Listing. The degen- 
erate nature of the genetic code is well known, and, 
accordingly, each amino acid presented in the Sequence 
Listing, is generically representative of the well known 
nucleic acid "triplet" codon, or in many cases codons, that 
can encode the amino acid. As such, as contemplated herein, 
the amino acid sequences presented in the Sequence Listing, 
when taken together with the genetic code (see, for example, 
Table 4-1 at page 109 of "Molecular Cell Biology", 1986, J. 
Darnell et al., eds., Scientific American Books, New York, 
NY, herein incorporated by reference) are generically rep- 
resentative of all the various permutations and combinations 
of nucleic acid sequences that can encode such amino acid 
sequences. 

The invention also encompasses proteins that are func- 
tionally equivalent to the NHPs encoded by the presently 
described nucleotide sequences as judged by any of a 
number of criteria, including, but not limited to, the ability 
to bind and modify a NHP substrate, or the ability to effect 
an identical or complementary downstream pathway, or a 
change in cellular metabolism (e.g., proteolytic activity, ion 
flux, tyrosine phosphorylation, etc.). Such functionally 
equivalent NHP proteins include, but are not limited to, 
additions or substitutions of amino acid residues within the 
amino acid sequence encoded by the NHP nucleotide 
sequences described above, but that result in a silent change, 
thus producing a functionally equivalent expression product: 
Amino acid substitutions may be made on the basis of 
similarity in polarity, charge, solubility, hydrophobicity, 
hydrophilicity, and/or the amphipathic nature of the residues 
involved. For example, nonpolar (hydrophobic) amino acids 
include alanine, leucine, isoleucine, valine, proline, 
phenylalanine, tryptophan, and methionine; polar neutral 
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amino acids include glycine, serine, threonine, cysteine, 
tyrosine, asparagine, and glutamine; positively charged 
(basic) amino acids include arginine, lysine, and histidine; 
and negatively charged (acidic) amino acids include aspartic 

5 acid and glutamic acid. 

A variety of host-expression vector systems can be used 
to express the NHP nucleotide sequences of the invention. 
Where the NHP peptide or polypeptide can exist, or has been 
engineered to exist, as a soluble or secreted molecule, the 

10 soluble NHP peptide or polypeptide can be recovered from 
the culture media. Such expression systems also encompass 
engineered host cells that express a NHP, or functional 
equivalent, in situ. Purification or enrichment of a NHP from 
such expression systems can be accomplished using appro- 

15 priate detergents and lipid micelles and methods well known 
to those skilled in the art. However, such engineered host 
cells themselves may be used in situations where it is 
important not only to retain the structural and functional 
characteristics of the NHP, but to assess biological activity, 

20 e.g., in drug screening assays. 

The expression systems that may be used for purposes of 
the invention include but are not limited to microorganisms 
such as bacteria (e.g., £. coli, B. subtilis) transformed with 
recombinant bacteriophage DNA, plasmid DNA or cosmid 

25 DNA expression vectors containing NHP nucleotide 
sequences; yeast (e.g., Saccharomyces, Pichia) transformed 
with recombinant yeast expression vectors containing NHP 
nucleotide sequences; insect cell systems infected with 
recombinant virus expression vectors (e.g., baculo virus) 

30 containing NHP sequences; plant cell systems infected with 
recombinant virus expression vectors (e.g., cauliflower 
mosaic virus, CaMV; tobacco mosaic virus, TMV) or trans- 
formed with recombinant plasmid expression vectors (e.g., 
Ti plasmid) containing NHP nucleotide sequences; or mam- 

35 malian cell systems (e.g., COS, CHO, BHK, 293, 3T3) 
harboring recombinant expression constructs containing 
promoters derived from the genome of mammalian cells 
(e.g., metallothionein promoter) or from mammalian viruses 
(e.g., the adenovirus late promoter; the vaccinia virus 7.5K 

40 promoter). 

In bacterial systems, a number of expression vectors may 
be advantageously selected depending upon the use intended 
for the NHP product being expressed. For example, when a 
large quantity of such a protein is to be produced for the 

45 generation of pharmaceutical compositions of or containing 
NHP, or for raising antibodies to a NHP, vectors that direct 
the expression of high levels of fusion protein products that 
are readily purified may be desirable. Such vectors include, 
but are not limited, to the E. coli expression vector pUR278 

so (Ruther et al., 1983, EMBO J. 2:1791), in which a NHP 
coding sequence may be ligated individually into the vector 
in frame with the lacZ coding region so that a fusion protein 
is produced; pIN vectors (Inouye & Inouye, 1985, Nucleic 
Acids Res. 13:3101-3109; Van Heeke & Schuster, 1989, J. 

55 Biol. Chem. 264:5503-5509); and the like. pGEX vectors 
may also be used to express foreign polypeptides as fusion 
proteins with glutathione S-transf erase (GST). In general, 
such fusion proteins are soluble and can easily be purified 
from lysed cells by adsorption to glutathione-agarose beads 

60 followed by elution in the presence of free glutathione. The 
PGEX vectors are designed to include thrombin or factor Xa 
protease cleavage sites so that the cloned target expression 
product can be released from the GST moiety. 

In an insect system, Autographa calif ornica nuclear poly- 

65 hedrosis virus (AcNPV) is used as a vector to express 
foreign polynucleotide sequences. The virus grows in 
Spodoptera frugiperda cells. A NHP coding sequence can be 
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cloned individually into non-essential regions (for example 
the polyhedrin gene) of the virus and placed under control of 
an AcNPV promoter (for example the polyhedrin promoter). 
Successful insertion of NHP coding sequence will result in 
inactivation of the poiyhedrin gene and production of non- 
occluded recombinant virus (i.e., virus lacking the proteina- 
ceous coat coded for by the polyhedrin gene). These recom- 
binant viruses are then used to infect Spodoptera frugiperda 
cells in which the inserted sequence is expressed (e.g., see 
Smith et al., 1983, J. Virol. 46: 584; Smith, U.S. Pat. No. 
4,215,051). 

In mammalian host cells, a number of viral-based expres- 
sion systems may be utilized. In cases where an adenovirus 
is used as an expression vector, the NHP nucleotide 
sequence of interest may be ligated to an adenovirus 
transcription/translation control complex, e.g., the late pro- 
moter and tripartite leader sequence. This chimeric sequence 
may then be inserted in the adenovirus genome by in vitro 
or in vivo recombination. Insertion in a non-essential region 
of the viral genome (e.g., region El or E3) will result in a 
recombinant virus that is viable and capable of expressing a 
NHP product in infected hosts (e.g., see Logan & Shenk, 
1984, Proc. Natl. Acad. Sci. USA 81:3655-3659). Specific 
initiation signals may also be required for efficient transla- 
tion of inserted NHP nucleotide sequences. These signals 
include the ATG initiation codon and adjacent sequences. In 
cases where an entire NHP gene or cDNA, including its own 
initiation codon and adjacent sequences, is inserted into the 
appropriate expression vector, no additional translational 
control signals may be needed. However, in cases where 
only a portion of a NHP coding sequence is inserted, 
exogenous translational control signals, including, perhaps, 
the ATG initiation codon, must be provided. Furthermore, 
the initiation codon must be in phase with the reading frame 
of the desired coding sequence to ensure translation of the 
entire insert. These exogenous translational control signals 
and initiation codons can be of a variety of origins, both 
natural and synthetic. The efficiency of expression may be 
enhanced by the inclusion of appropriate transcription 
enhancer elements, transcription terminators, etc. (See Bitter 
et al., 1987, Methods in Enzymol. 153:516-544). 

In addition, a host cell strain may be chosen that modu- 
lates the expression of the inserted sequences, or modifies 
and processes the expression product in the specific fashion 
desired. Such modifications (e.g., glycosylation) and pro- 
cessing (e.g., cleavage) of protein products may be impor- 
tant for the function of the protein. Different host cells have 
characteristic and specific mechanisms for the post- 
translational processing and modification of proteins and 
expression products. Appropriate cell lines or host systems 
can be chosen to ensure the correct modification and pro- 
cessing of the foreign protein expressed. To this end, eukary- 
otic host cells that possess the cellular machinery for proper 
processing of the primary transcript, glycosylation, and 
phosphorylation of the expression product may be used. 
Such mammalian host cells include, but are not limited to, 
CHO, VERO, BHK, HeLa, COS, MDCK, 293, 3T3, WI38, 
and in particular, human cell lines. 

For long-term, high-yield production of recombinant 
proteins, stable expression is preferred. For example, cell 
lines that stably express the NHP sequences described above 
can be engineered. Rather than using expression vectors that 
contain viral origins of replication, host cells can be trans- 
formed with DNA controlled by appropriate expression 
control elements (e.g., promoter, enhancer sequences, tran- 
scription terminators, polyadenylation sites, etc.), and a 
selectable marker. Following the introduction of the foreign 
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DNA, engineered cells may be allowed to grow for 1-2 days 
in an enriched media, and then are switched to a selective 
media. The selectable marker in the recombinant plasmid 
confers resistance to the selection and allows cells to stably 

5 integrate the plasmid into their chromosomes and grow to 
form foci, which in turn can be cloned and expanded into 
cell lines. This method may advantageously be used to 
engineer cell lines that express the NHP product. Such 
engineered cell lines may be particularly useful in screening 
and evaluation of compounds that affect the endogenous 
activity of the NHP product. 

A number of selection systems may be used, including but 
not limited to the herpes simplex virus thymidine kinase 
(Wigler, et al., 1977, Cell 11:223), hypoxanthine-guanine 
phosphoribosyltransferase (Szybalska & Szybalski, 1962, 

15 Proc. Natl. Acad. Sci. USA 48:2026), and adenine phospho- 
ribosyltransferase (Lowy, et al., 1980, Cell 22:817) genes, 
which can be employed in tk~, hgprt" or aprt' cells, respec- 
tively. Also, antimetabolite resistance can be used as the 
basis of selection for the following genes: dhfr, which 

20 confers resistance to methotrexate (Wigler, et al., 1980, Natl. 
Acad. Sci. USA 77:3567; O'Hare, et al., 1981, Proc. Natl. 
Acad. Sci. USA 78:1527); gpt, which confers resistance to 
mycophenolic acid (Mulligan & Berg, 1981, Proc. Natl. 
Acad. Sci. USA 78:2072); neo, which confers resistance to 

25 the aminoglycoside G-418 (Colberre-Garapin, et al., 1981, J. 
Mol. Biol. 150:1); and hygro, which confers resistance to 
hygromycin (Santerre, et al., 1984, Gene 30:147). 

Alternatively, any fusion protein can be readily purified 
by utilizing an antibody specific for the fusion protein being 

30 expressed. For example, a system described by Janknecht et 
al., allows for the ready purification of non-denatured fusion 
proteins expressed in human cell lines (Janknecht, et al., 
1991, Proc. Natl. Acad. Sci. USA 88:8972-8976). In this 
system, the sequence of interest is subcloned into a vaccinia 

35 recombination plasmid such that the sequence's open read- 
ing frame is translationally fused to an amino-terminal tag 
consisting of six histidine residues. Extracts from cells 
infected with recombinant vaccinia virus are loaded onto 
Ni 2+ .nitriloacetic acid-agarose columns and histidine -tagged 

40 proteins are selectively eluted with imidazole-containing 
buffers. 

Also encompassed by the present invention are fusion 
proteins that direct the NHP to a target organ and/or facilitate 
transport across the membrane into the cytosol. Conjugation 

45 of NHPs to antibody molecules or their Fab fragments could 
be used to target cells bearing a particular epitope. Attaching 
the appropriate signal sequence to the NHP would also 
transport the NHP to the desired location within the cell. 
Alternatively targeting of NHP or its nucleic acid sequence 

50 might be achieved using liposome or lipid complex based 
delivery systems. Such technologies are described in "Lipo- 
somes: A Practical Approach", New, ed., Oxford University 
Press, New York and in U.S. Pat. Nos. 4,594,595, 5,459,127, 
5,948,767 and 6,110,490 and their respective disclosures, 

55 which are herein incorporated by reference in their entirety. 
Additionally embodied are novel protein constructs engi- 
neered in such a way that they facilitate transport of the NHP 
to the target site or desired organ, where they cross the cell 
membrane and/or the nucleus where the NHP can exert its 

60 functional activity. This goal may be achieved by coupling 
of the NHP to a cytokine or other ligand that provides 
targeting specificity, and/or to a protein transducing domain 
(see generally U.S. applications Ser. No. 60/111,701 and 
60/056,713, both of which are herein incorporated by 

65 reference, for examples of such transducing sequences) to 
facilitate passage across cellular membranes and can option- 
ally be engineered to include nuclear localization. 
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5.3 ANTIBODIES TO NHP PRODUCTS 

Antibodies that specifically recognize one or more 
epitopes of a NHP, or epitopes of conserved variants of a 
NHP, or peptide fragments of a NHP are also encompassed 
by the invention. Such antibodies include but are not limited 
to polyclonal antibodies, monoclonal antibodies (mAbs), 
humanized or chimeric antibodies, single chain antibodies, 
Fab fragments, F(ab') 2 fragments, fragments produced by a 
Fab expression library, anti-idiotypic (anti-Id) antibodies, 
and epitope-binding fragments of any of the above. 

The antibodies of the invention can be used, for example, 
in the detection of NHP in a biological sample and may, 
therefore, be utilized as part of a diagnostic or prognostic 
technique whereby patients may be tested for abnormal 
amounts of NHR Such antibodies may also be utilized in 
conjunction with, for example, compound screening 
schemes for the evaluation of the effect of test compounds 
on expression and/or activity of a NHP expression product. 
Additionally, such antibodies can be used in conjunction 
gene therapy to, for example, evaluate the normal and/or 
engineered NHP-expressing cells prior to their introduction 
into the patient. Such antibodies may additionally be used as 
a method for the inhibition of abnormal NHP activity. Thus, 
such antibodies may, therefore, be utilized as part of treat- 
ment methods. 

For the production of antibodies, various host animals 
may be immunized by injection with the NHP, a NHP 
peptide (e.g., one corresponding to a functional domain of a 
NHP), truncated NHP polypeptides (NHP in which one or 
more domains have been deleted), functional equivalents of 
the NHP or mutated variant of the NHP. Such host animals 
may include but are not limited to pigs, rabbits, mice, goats, 
and rats, to name but a few. Various adjuvants may be used 
to increase the immunological response, depending on the 
host species, including but not limited to Freund's adjuvant 
(complete and incomplete), mineral salts such as aluminum 
hydroxide or aluminum phosphate, chitosan, surface active 
substances such as lysolecithin, pluronic polyols, 
polyanions, peptides, oil emulsions, and potentially useful 
human adjuvants such as BCG (bacille Calmette-Guerin) 
and Corynebacterium parvum. Alternatively, the immune 
response could be enhanced by combination and or coupling 
with molecules such as keyhole limpet hemocyanin, tetanus 
toxoid, diphtheria toxoid, ovalbumin, cholera toxin or frag- 
ments thereof. Polyclonal antibodies are heterogeneous 
populations of antibody molecules derived from the sera of 
the immunized animals. 

Monoclonal antibodies, which are homogeneous popula- 
tions of antibodies to a particular antigen, can be obtained by 
any technique that provides for the production of antibody 
molecules by continuous cell lines in culture. These include, 
but are not limited to, the hybridoma technique of Kohler 
and Milstein, (1975, Nature 256:495-497; and U.S. Pat. No. 
4376,110), the human B-cell hybridoma technique (Kosbor 
et al., 1983, Immunology Today 4:72; Cole et al., 1983, 
Proc. Natl. Acad. Sci. USA 80:2026-2030), and the EBV- 
hybridoma technique (Cole et al., 1985, Monoclonal Anti- 
bodies And Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). 
Such antibodies may be of any immunoglobulin class 
including IgG, IgM, IgE, IgA, IgD and any subclass thereof. 
The hybridoma producing the mAb of this invention may be 
cultivated in vitro or in vivo. Production of high titers of 
mAbs in vivo makes this the presently preferred method of 
production. 

In addition, techniques developed for the production of 
"chimeric antibodies" (Morrison et al., 1984, Proc. Natl. 
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Acad. Sci., 81:6851-6855; Neuberger et al., 1984, Nature, 
312:604-608; Takeda et al., 1985, Nature, 314:452-454) by 
splicing the genes from a mouse antibody molecule of 
appropriate antigen specificity together with genes from a 

5 human antibody molecule of appropriate biological activity 
can be used. A chimeric antibody is a molecule in which 
different portions are derived from different animal species, 
such as those having a variable region derived from a murine 
mAb and a human immunoglobulin constant region. Such 

10 technologies are described in U.S. Pat. Nos. 6,075,181 and 
5,877,397 and their respective disclosures, which are herein 
incorporated by reference in their entirety. Also encom- 
passed by the present invention is the use of fully humanized 
monoclonal antibodies as described in U.S. Pat. No. 6,150, 

15 584 and respective disclosures, which are herein incorpo- 
rated by reference in their entirety. 

Alternatively, techniques described for the production of 
single chain antibodies (U.S. Pat. No. 4,946,778; Bird, 1988, 
Science 242:423^*26; Huston et al., 1988, Proc. Natl. Acad. 

20 Sci. USA 85:5879-5883; and Ward et al., 1989, Nature 
341:544-546) can be adapted to produce single chain anti- 
bodies against NHP expression products. Single chain anti- 
bodies are formed by linking the heavy and light chain 
fragments of the Fv region via an amino acid bridge, 

25 resulting in a single chain polypeptide. 

Antibody fragments that recognize specific epitopes may 
be generated by known techniques. For example, such 
fragments include, but are not limited to: the F(ab') 2 

30 fragments, which can be produced by pepsin digestion of the 
antibody molecule and the Fab fragments, which can be 
generated by reducing the disulfide bridges of the F(ab') 2 
fragments. Alternatively, Fab expression libraries may be 
constructed (Huse et al., 1989, Science, 246:1275-1281) to 

35 allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity. 

Antibodies to a NHP can, in turn, be utilized to generate 
anti-idiotype antibodies that "mimic" a given NHP, using 
techniques well known to those skilled in the art. (See, e.g., 

40 Greenspan & Bona, 1993, FASEB J 7(5):437^44; and 
Nissinoff, 1991, J. Immunol. 147(8):2429-2438). For 
example antibodies that bind to a NHP domain and com- 
petitively inhibit the binding of NHP to its cognate receptor/ 
ligand can be used to generate an ti-idio types that "mimic" 

45 the NHP and, therefore, bind, activate, or neutralize a NHP, 
NHP receptor, or NHP ligand. Such anti-idiotypic antibodies 
or Fab fragments of such anti-idiotypes can be used in 
therapeutic regimens involving a NHP mediated pathway. 

5Q Additionally given the high degree of relatedness of 
mammalian NHPs, the presently described knock-out mice 
(having never seen NHP, and thus never been tolerized to 
NHP) have a unique utility, as they can be advantageously 
applied to the generation of antibodies against the disclosed 

55 mammalian NHP (i.e., NHP will be immunogenic in NHP 
knock-out animals). 

The present invention is not to be limited in scope by the 
specific embodiments described herein, which are intended 
as single illustrations of individual aspects of the invention, 

60 and functionally equivalent methods and components are 
within the scope of the invention. Indeed, various modifi- 
cations of the invention, in addition to those shown and 
described herein will become apparent to those skilled in the 
art from the foregoing description. Such modifications are 

65 intended to fall within the scope of the appended claims. All 
cited publications, patents, and patent applications are herein 
incorporated by reference in their entirety. 
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SEQUENCE LISTING 



<160> NUMBER OF SEQ ID NOS: 4 

<210> SEQ ID NO 1 

<211> LENGTH: 2301 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 1 

atggccagca ccaggagtat cgagctggag cactttgagg aacgggacaa aaggccgcgg 60 

ccggggtcgc ggagaggggc ccccagctcc tccgggggca gcagcagctc gggccccaag 120 

gggaacgggc tcatccccag tccggcgcac agtgcccact gcagcttcta ccgcacgcgg 180 

accctgcagg ccctcagctc ggagaagaag gccaagaagg cgcgcttcta ccggaacggg 240 

gaccgctact tcaagggcct ggtgtttgcc atctccagcg accgcttccg gtcyttcgat 300 

gcgctcctca tagagctcac ccgctccctg tcggacaacg tgaacctgcc ccagggtgtc 360 

cgcactatct acaccatcga cggcagccgg aaggtcacca gcctggacga gctgctggaa 420 

ggtgagagtt acgtgtgtgc atccaatgaa ccatttcgta aagtcgatta caccaaaaat 480 

attaatccaa actggtctgt gaacatcaag ggtgggacat cccgagcgct ggctgctgcc 540 

tcctctgtga aaagtgaagt aaaagaaagt aaagatttca tcaaacccaa gttagtgact 600 

gtgattcgaa gtggagtgaa gcctagaaaa gccgtgcgga tccttctgaa taaaaagact 660 

gctcattcct ttgaacaagt cttaacagat atcaccgaag ccattaaact agactcagga 720 

gtcgtcaaga ggctctgcac cctggatgga aagcaggtta cttgtctgca agactttttt 780 

ggtgatgacg atgtttttat tgcatgtgga ccagaaaaat ttcgttatgc ccaagatgac 84 0 

tttgtcctgg atcatagtga atgtcgtgtc ctgaagtcat cttattctcg atcctcagct 900 

gttaagtatt ctggatccaa aagccctggg ccctctcgac gcagcaaatc accagcttca 960 

gttaatggaa ctcccagcag ccaactttct actcctaaat ctacgaaatc ctccagttcc 1020 

tctccaacta gtccaggaag tttcagagga ttaaagcaga tttctgctca tggcagatct 1080 

tcttccaatg taaacggtgg acctgagctt gaccgttgca taagtcctga aggtgtgaat 1140 

ggaaacagat gctctgaatc atcaactctt cttgagaaat acaaaattgg aaaggtcatt 1200 

ggtgatggca attttgcagt agtcaaagag tgtatagaca ggtccactgg aaaggagttt 1260 

gccctaaaga ttatagacaa agccaaatgt tgtggaaagg aacacctgat tgagaatgaa 1320 

gtgtcaatac tgcgccgagt gaaacatccc aatatcatta tgctggtcga ggagatggaa 1380 

acagcaactg agctctttct ggtgatggaa ttggtcaaag gtggagatct ctttgatgca 144 0 

attacttcgt cgaccaagta cactgagaga gatggcagtg ccatggtgta caacttagcc 1500 

aatgccctca ggtatctcca tggcctcagc atcgtgcaca gagacatcaa accagagaat 1560 

ctcttggtgt gtgaatatcc tgatggaacc aagtctttga aactgggaga ctttgggctt 1620 

gcgactgtgg tagaaggccc tttatacaca gtctgtggca cacccactta tgtggctcca 1680 

gaaatcattg ctgaaactgg ctatggcctg aaggtggaca tttgggcagc tggtgtgatc 174 0 

acatacatac ttctctgtgg attcccacca ttccgaagtg agaacaatct ccaggaagat 1800 

ctcttcgacc agatcttggc tgggaagctg gagtttccgg ccccctactg ggataacatc 1860 

acggactctg ccaaggaatt aatcagtcaa atgcttcagg taaatgttga agctcggtgt 1920 

accgcgggac aaatcctgag tcacccctgg gtgtcagatg atgcctccca ggagaataac 1980 

atgcaagctg aggtgacagg taaactaaaa cagcacttta ataatgcgct ccccaaacag 2040 
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-continued 



aacagcacta ccaccggggt 


ctccgtcatc 


atgaacacgg 


ctctagataa 


ggaggggcag 


2100 


attttctgca gcaagcactg 


tcaagacagc 


ggcaggcctg 


ggatggagcc 


catctctcca 


2160 


gttcctccct cagtggagga 


gatccctgtg 


cctggggaag 


cagtcccggc 


ccccacccct 


2220 


ccggaatctc ccacccccca 


ctgtcctccc 


gctgccccgg gtggtgagcg ggcaggaacc 


2280 


tggcgccgcc accgagactg 


a 








2301 



<210> SEQ ID NO 2 

<211> LENGTH: 766 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 2 

Met Ala Ser Thr Arg Ser lie Glu Leu Glu His Phe Glu Glu Arg Asp 
15 10 15 

Lys Arg Pro Arg Pro Gly Ser Arg Arg Gly Ala Pro Ser Ser Ser Gly 
20 25 30 

Gly Ser Ser Ser Ser Gly Pro Lys Gly Asn Gly Leu lie Pro Ser Pro 
35 40 45 

Ala His Ser Ala His Cys Ser Phe Tyr Arg Thr Arg Thr Leu Gin Ala 
50 55 60 

Leu Ser Ser Glu Lys Lys Ala Lys LyB Ala Arg Phe Tyr Arg Asn Gly 
65 70 75 80 

Asp Arg Tyr Phe Lys Gly Leu Val Phe Ala lie Ser Ser Asp Arg Phe 
85 90 95 

Arg Ser Phe Asp Ala Leu Leu lie Glu Leu Thr Arg Ser Leu Ser Asp 
100 105 110 

Aan Val Asn Leu Pro Gin Gly Val Arg Thr lie Tyr Thr lie Asp Gly 
115 120 125 

Ser Arg Lys Val Thr Ser Leu Asp Glu Leu Leu Glu Gly Glu Ser Tyr 
130 135 140 

Val Cys Ala Ser Asn Glu Pro Phe Arg Lys Val Asp Tyr Thr Lys Asn 
145 150 155 160 

lie Asn Pro Asn Trp Ser Val Asn lie Lys Gly Gly Thr Ser Arg Ala 
165 170 175 

Leu Ala Ala Ala Ser Ser Val Lys Ser Glu Val Lys Glu Ser Lys Asp 
180 185 190 

Phe He Lys Pro Lye Leu Val Thr Val He Arg Ser Gly Val Lys Pro 
195 200 205 

Arg Lys Ala Val Arg He Leu Leu Asn Lys Lys Thr Ala His Ser Phe 
210 215 220 

Glu Gin Val Leu Thr Asp He Thr Glu Ala He Lys Leu Asp Ser Gly 
225 230 235 240 

Val Val Lys Arg Leu Cys Thr Leu Asp Gly Lys Gin Val Thr Cys Leu 
245 250 255 

Gin Asp Phe Phe Gly Asp Asp Asp Val Phe He Ala Cys Gly Pro Glu 
260 265 270 

Lys Phe Arg Tyr Ala Gin Asp Asp Phe Val Leu Asp His Ser Glu Cys 
275 280 285 

Arg Val Leu Lys Ser Ser Tyr Ser Arg Ser Ser Ala Val Lys Tyr Ser 
290 295 300 

Gly Ser Lys Ser Pro Gly Pro Ser Arg Arg Ser Lys Ser Pro Ala Ser 
305 310 315 320 
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-continued 



Val Asn Gly Thr Pro Ser Ser Gin Leu Ser Thr Pro Lys Ser Thr Lys 
325 330 335 

Ser Ser Ser Ser Ser Pro Thr Ser Pro Gly Ser Phe Arg Gly Leu LyB 
340 345 350 

Gin lie Ser Ala His Gly Arg Ser Ser Ser Asn Val Asn Gly Gly Pro 
355 360 365 

Glu Leu Asp Arg Cyo lie Ser Pro Glu Gly Val Asn Gly Asn Arg Cys 
370 375 380 

Ser Glu Ser Ser Thr Leu Leu Glu Lys Tyr Lys lie Gly Lys Val lie 
385 390 395 400 

Gly Asp Gly Asn Phe Ala Val Val Lys Glu Cys lie Asp Arg Ser Thr 
405 410 415 

Gly Lys Glu Phe Ala Leu Lys lie lie Asp Lys Ala Lys Cys Cys Gly 
420 425 430 

Lys Glu His Leu lie Glu Asn Glu Val Ser lie Leu Arg Arg Val Lys 
435 440 445 

His Pro Asn He He Met Leu Val Glu Glu Met Glu Thr Ala Thr Glu 
450 455 460 

Leu Phe Leu Val Met Glu Leu Val Lys Gly Gly Asp Leu Phe Asp Ala 
465 470 475 480 

He Thr Ser Ser Thr Lys Tyr Thr Glu Arg Asp Gly Ser Ala Met Val 
485 490 495 

Tyr Asn Leu Ala Asn Ala Leu Arg Tyr Leu His Gly Leu Ser He Val 
500 505 510 

His Arg Asp He Lys Pro Glu Asn Leu Leu Val Cye Glu Tyr Pro Asp 
515 520 525 

Gly Thr Lys Ser Leu Lys Leu Gly Asp Phe Gly Leu Ala Thr Val Val 
530 535 540 

Glu Gly Pro Leu Tyr Thr Val Cys Gly Thr Pro Thr Tyr Val Ala Pro 
545 550 555 560 

Glu He He Ala Glu Thr Gly Tyr Gly Leu Lys Val Asp He Trp Ala 
565 570 575 

Ala Gly Val He Thr Tyr He Leu Leu Cys Gly Phe Pro Pro Phe Arg 
580 585 590 

Ser Glu Asn Asn Leu Gin Glu Asp Leu Phe Asp Gin He Leu Ala Gly 
595 600 605 

Lys Leu Glu Phe Pro Ala Pro Tyr Trp Asp Asn He Thr Asp Ser Ala 
610 615 620 

Lys Glu Leu He Ser Gin Met Leu Gin Val Asn Val Glu Ala Arg Cys 
625 630 635 640 

Thr Ala Gly Gin He Leu Ser His Pro Trp Val Ser Asp Asp Ala Ser 
645 650 655 

Gin Glu Asn Asn Met Gin Ala Glu Val Thr Gly Lys Leu Lys Gin His 
660 665 670 

Phe Asn Asn Ala Leu Pro Lys Gin Asn Ser Thr Thr Thr Gly Val Ser 
675 680 685 

Val He Met Asn Thr Ala Leu Asp Lys Glu Gly Gin He Phe Cys Ser 
690 695 700 

Lys His" Cys Gin Asp Ser Gly Arg Pro Gly Met Glu Pro He Ser Pro 
705 710 715 720 

Val Pro Pro Ser Val Glu Glu He Pro Val Pro Gly Glu Ala Val Pro 
725 730 735 

Ala Pro Thr Pro Pro Glu Ser Pro Thr Pro His Cys Pro Pro Ala Ala 
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740 745 750 

Pro Gly Gly Glu Arg Ala Gly Thr Trp Arg Arg His Arg Asp 
755 760 765 



<210> SEQ ID NO 3 

<211> LENGTH: 2298 

<212> TYPE: DNA 

<213> ORGANISM: homo sapieno 

<400> SEQUENCE: 3 

atggccagca ccaggagtat cgagctggag cactttgagg aacgggacaa aaggccgcgg 60 

ccggggtcgc ggagaggggc ccccagctcc tccgggggca gcagcagctc gggccccaag 120 

gggaacgggc tcatccccag tccggcgcac agtgcccact gcagcttcta ccgcacgcgg 180 

accctgcagg ccctcagctc ggagaagaag gccaagaagg cgcgcttcta ccggaacggg 240 

gaccgctact tcaagggcct ggtgtttgcc atctccagcg accgcttccg gtcyttcgat 300 

gcgctcctca tagagctcac ccgctccctg tcggacaacg tgaacctgcc ccagggtgtc 360 

cgcactatct acaccatcga cggcagccgg aaggtcacca gcctggacga gctgctggaa 420 

ggtgagagtt acgtgtgtgc atccaatgaa ccatttcgta aagtcgatta caccaaaaat 480 

attaatccaa actggtctgt gaacatcaag ggtgggacat cccgagcgct ggctgctgcc 540 

tcctctgtga aaagtgaagt aaaagaaagt aaagatttca tcaaacccaa gttagtgact 600 

gtgattcgaa gtggagtgaa gcctagaaaa gccgtgcgga tccttctgaa taaaaagact 660 

gctcattcct ttgaacaagt cttaacagat atcaccgaag ccattaaact agactcagga 720 

gtcgtcaaga ggctctgcac cctggatgga aagcaggtta cttgtctgca agactttttt 780 

ggtgatgacg atgtttttat tgcatgtgga ccagaaaaat ttcgttatgc ccaagatgac 840 

tttgtcctgg atcatagtga atgtcgtgtc ctgaagtcat cttattctcg atcctcagct 900 

gttaagtatt ctggatccaa aagccctggg ccctctcgac gcagcaaatc accagcttca 960 

gttaatggaa ctcccagcag ccaactttct actcctaaat ctacgaaatc ctccagttcc 1020 

tctccaacta gtccaggaag tttcagagga ttaaagattt ctgctcatgg cagatcttct 1080 

tccaatgtaa acggtggacc tgagcttgac cgttgcataa gtcctgaagg tgtgaatgga 1140 

aacagatgct ctgaatcatc aactcttctt gagaaataca aaattggaaa ggtcattggt 1200 

gatggcaatt ttgcagtagt caaagagtgt atagacaggt ccactggaaa ggagtttgcc 1260 

ctaaagatta tagacaaagc caaatgttgt ggaaaggaac acctgattga gaatgaagtg 1320 

tcaatactgc gccgagtgaa acatcccaat atcattatgc tggtcgagga gatggaaaca 1380 

gcaactgagc tctttctggt gatggaattg gtcaaaggtg gagatctctt tgatgcaatt 1440 

acttcgtcga ccaagtacac tgagagagat ggcagtgcca tggtgtacaa cttagccaat 1500 

gccctcaggt atctccatgg cctcagcatc gtgcacagag acatcaaacc agagaatctc 1560 

ttggtgtgtg aatatcctga tggaaccaag tctttgaaac tgggagactt tgggcttgcg 1620 

actgtggtag aaggcccttt atacacagtc tgtggcacac ccacttatgt ggctccagaa 1680 

atcattgctg aaactggcta tggcctgaag gtggacattt gggcagctgg tgtgatcaca 174 0 

tacatacttc tctgtggatt cccaccattc cgaagtgaga acaatctcca ggaagatctc 1800 

ttcgaccaga tcttggctgg gaagctggag tttccggccc cctactggga taacatcacg 1860 

gactctgcca aggaattaat cagtcaaatg cttcaggtaa atgttgaagc tcggtgtacc 1920 

gcgggacaaa tcctgagtca cccctgggtg tcagatgatg cctcccagga gaataacatg 1980 
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caagctgagg tgacaggtaa actaaaacag cactttaata atgcgctccc caaacagaac 2040 

agcactacca ccggggtctc cgtcatcatg aacacggctc tagataagga ggggcagatt 2100 

ttctgcagca agcactgtca agacagcggc aggcctggga tggagcccat ctctccagtt 2160 

cctccctcag tggaggagat ccctgtgcct ggggaagcag tcccggcccc cacccctccg 2220 

gaatctccca ccccccactg tcctcccgct gccccgggtg gtgagcgggc aggaacctgg 2280 

cgccgccacc gagactga 2298 



<210> SEQ ID NO 4 

<211> LENGTH: 765 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 4 

Met Ala Ser Thr Arg Ser lie Glu Leu Glu His Phe Glu Glu Arg Asp 
1 5 10 15 

Lys Arg Pro Arg Pro Gly Ser Arg Arg Gly Ala Pro Ser Ser Ser Gly 
20 25 30 

Gly Ser Ser Ser Ser Gly Pro Lys Gly Asn Gly Leu lie Pro Ser Pro 
35 40 45 

Ala His Ser Ala His Cys Ser Phe Tyr Arg Thr Arg Thr Leu Gin Ala 
50 55 60 

Leu Ser Ser Glu Lys Lys Ala Lys LyB Ala Arg Phe Tyr Arg Asn Gly 
65 70 75 80 

Asp Arg Tyr Phe Lys Gly Leu Val Phe Ala lie Ser Ser Asp Arg Phe 
85 90 95 

Arg Ser Phe Asp Ala Leu Leu lie Glu Leu Thr Arg Ser Leu Ser Asp 
100 105 110 

Asn Val Asn Leu Pro Gin Gly Val Arg Thr lie Tyr Thr lie Asp Gly 
115 120 125 

Ser Arg Lys Val Thr Ser Leu Asp Glu Leu Leu Glu Gly Glu Ser Tyr 
130 135 140 

Val Cys Ala Ser Asn Glu Pro Phe Arg Lys Val Asp Tyr Thr Lys Asn 
145 150 155 160 

lie Asn Pro Asn Trp Ser Val Asn lie Lys Gly Gly Thr Ser Arg Ala 
165 170 175 

Leu Ala Ala Ala Ser Ser Val Lys Ser Glu Val Lys Glu Ser Lys Asp 
180 185 190 

Phe lie Lys Pro Lys Leu Val Thr Val lie Arg Ser Gly Val Lys Pro 
195 200 205 

Arg Lys Ala Val Arg lie Leu Leu Asn Lys Lys Thr Ala His Ser Phe 
210 215 220 

Glu Gin Val Leu Thr Asp lie Thr Glu Ala lie Lys Leu Asp Ser Gly 
225 230 235 240 

Val Val Lys Arg Leu Cys Thr Leu Asp Gly Lys Gin Val Thr Cys Leu 
245 250 255 

Gin Asp Phe Phe Gly Asp Asp Asp Val Phe lie Ala Cys Gly Pro Glu 
260 265 270 

Lys Phe Arg Tyr Ala Gin Asp Asp Phe Val Leu Asp His Ser Glu Cys 
275 280 285 

Arg Val Leu Lys Ser Ser Tyr Ser Arg Ser Ser Ala Val Lys Tyr Ser 
290 295 300 



Gly Ser Lys Ser Pro Gly Pro Ser Arg Arg Ser Lys Ser Pro Ala Ser 
305 310 315 320 
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Val Asn Gly Thr Pro Ser Ser Gin Leu Ser Thr Pro Lys Ser Thr Lys 
325 . 330 335 

Ser Ser Ser Ser Ser Pro Thr Ser Pro Gly Ser Phe Arg Gly Leu Lys 
340 345 350 

He Ser Ala His Gly Arg Ser Ser Ser Asn Val Asn Gly Gly Pro Glu 
355 360 365 

Leu Asp Arg Cys He Ser Pro Glu Gly Val Asn Gly Asn Arg Cys Ser 
370 375 380 

Glu Ser Ser Thr Leu Leu Glu Lys Tyr Lys He Gly Lys Val He Gly 
385 390 395 400 

Asp Gly Asn Phe Ala Val Val Lys Glu Cys He Asp Arg Ser Thr Gly 
405 410 415 

Lys Glu Phe Ala Leu Lys He He Asp Lys Ala Lys Cys Cys Gly Lys 
420 425 430 

Glu Hie Leu He Glu Asn Glu Val Ser He Leu Arg Arg Val Lys His 
435 440 445 

Pro Asn He He Met Leu Val Glu Glu Met Glu Thr Ala Thr Glu Leu 
450 455 460 

Phe Leu Val Met Glu Leu Val Lys Gly Gly Asp Leu Phe Asp Ala He 
465 470 475 480 

Thr Ser Ser Thr Lys Tyr Thr Glu Arg Asp Gly Ser Ala Met Val Tyr 
485 490 495 

Asn Leu Ala Asn Ala Leu Arg Tyr Leu His Gly Leu Ser He Val His 
500 505 510 

Arg Asp He Lys Pro Glu Asn Leu Leu Val Cys Glu Tyr Pro Asp Gly 
515 520 525 

Thr Lys Ser Leu Lys Leu Gly Asp Phe Gly Leu Ala Thr Val Val Glu 
530 535 540 

Gly Pro Leu Tyr Thr Val Cys Gly Thr Pro Thr Tyr Val Ala Pro Glu 
545 550 555 560 

He He Ala Glu Thr Gly Tyr Gly Leu Lys Val Asp He Trp Ala Ala 
565 570 575 

Gly Val He Thr Tyr He Leu Leu Cys Gly Phe Pro Pro Phe Arg Ser 
580 585 590 

Glu Asn Asn Leu Gin Glu Asp Leu Phe Asp Gin He Leu Ala Gly Lys 
595 600 605 

Leu Glu Phe Pro Ala Pro Tyr Trp Asp Asn He Thr Asp Ser Ala Lys 
610 615 620 

Glu Leu He Ser Gin Met Leu Gin Val Asn Val Glu Ala Arg Cys Thr 
625 630 635 640 

Ala Gly Gin He Leu Ser His Pro Trp Val Ser Asp Asp Ala Ser Gin 
645 650 655 

Glu Asn Asn Met Gin Ala Glu Val Thr Gly Lys Leu Lys Gin His Phe 
660 665 670 

Asn Asn Ala Leu Pro Lys Gin Asn Ser Thr Thr Thr Gly Val Ser Val 
675 680 685 

He Met Asn Thr Ala Leu Asp Lys Glu Gly Gin He Phe Cys Ser Lys 
690 695 700 



His Cys Gin Asp Ser Gly Arg Pro Gly Met Glu Pro He Ser Pro Val 
705 710 715 720 



Pro Pro Ser Val Glu Glu He Pro Val Pro Gly Glu Ala Val Pro Ala 
725 730 735 
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Pro Thr Pro Pro Glu Ser Pro Thr Pro His Cys Pro Pro Ala Ala Pro 
740 745 750 

Gly Gly Glu Arg Ala Gly Thr Trp Arg Arg His Arg Asp 
755 760 765 



What is claimed is: 

1. An isolated nucleic acid molecule comprising a nucle- 
otide sequence drawn from the group consisting of SEQ ID 
NO:l and SEQ ID NO:3. 

2. An isolated nucleic acid molecule comprising a nucle- 
otide sequence that: 

(a) encodes the amino acid sequence shown in SEQ ID 
NO:2; and 

(b) hybridizes under stringent conditions to the nucleotide 
sequence of SEQ ID NO:l or the complement thereof. 



io 3. An isolated nucleic acid molecule comprising a nucle- 
otide sequence encoding the amino acid sequence shown in 
SEQ ID NO:2. 

4. An isolated nucleic acid molecule comprising a nucle- 
otide sequence encoding the amino acid sequence shown in 
SEQ ID NO:4. 

***** 
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HUMAN PHOSPHATASES AND 
POLYNUCLEOTIDES ENCODING THE 
SAME 

The present application claims the benefit of U.S. Pro- 
visional Application No. 60/210,607 which was filed on Jun. 
9, 2000 and . is herein incorporated by reference in its 
entirety. 

1. INTRODUCTION 

The present invention relates to the discovery, 
identification, and characterization of novel human poly- 
nucleotides encoding proteins that share sequence similarity 
with animal phosphatases. The invention encompasses the 
described polynucleotides, host cell expression systems, the 
encoded proteins, fusion proteins, polypeptides and 
peptides, antibodies to the encoded proteins and peptides, 
and genetically engineered animals that either lack or over 
express the disclosed genes, antagonists and agonists of the 
proteins, and other compounds that modulate the expression 
or activity of the proteins encoded by the disclosed genes 
that can be used for diagnosis, drug screening, clinical trial 
monitoring, the treatment of physiological disorders, or 
otherwise contributing to the quality of life. 

2. BACKGROUND OF THE INVENTION 

Membrane proteins can act as, inter alia, ligand receptors, 
signal transducers, neuronal guidance proteins, cell adhesion 
proteins, cell surface markers, and can also possess enzy- 
matic functions, such as the phosphorylation of substrates 
(i.e., kinase activity). Phosphatases mediate dephosphoryla- 
tion of a wide variety of proteins and compounds in the cell. 
Often working in conjunction with kinases, phosphatases are 
involved in a regulating a wide range of biochemical and 
physiological pathways. Given the physiological importance 
of phosphatases, they have been subject to significant scru- 
tiny and are good drug targets. 

3. SUMMARY OF THE INVENTION 

The present invention relates to the discovery, 
identification, and characterization of nucleotides that 
encode novel human proteins and the corresponding amino 
acid sequences of these proteins. The novel human proteins 
(NHPs) described for the first time herein share structural 
similarity with animal immunoglobulin super family cell 
surface proteins, proteins that play a role in neuronal guid- 
ance (e.g., nope, punc, unc, and neogenin), phosphatases, 
netrin receptors, DCC (deleted in colon cancer) including, 
but not limited to tyrosine phosphatases, and cell adhesion 
molecules as homologues and orthologs across a range of 
phyla and species. 

The novel human polynucleotides described herein, 
encode open reading frames (ORFs) encoding proteins of 
1,069, 380, 904, 1150, 985, 991, 302, 826, 1072, 907, 712, 
624, 547, 793, and 628 amino acids in length (see SEQ ID 
NOS: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, and 
30 respectively). 

The invention also encompasses agonists and antagonists 
of the described NHPs, including small molecules, large 
molecules, mutant NHPS, or portions thereof that compete 
with native NHPs, NHP peptides, and antibodies, as well as 
nucleotide sequences that can be used to inhibit the expres- 
sion of the described NHPs (e.g., antisense and ribozyme 
molecules, and gene or regulatory sequence replacement 
constructs) or to enhance the expression of the described 
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NHP polynucleotides (e.g., expression constructs that place 
the described gene under the control of a strong promoter 
system). The present invention also includes both transgenic 
animals that express a NHP transgene, and NHP "knock- 

5 outs" (which can be conditional) that do not express a 
functional NHP. Knockout murine ES cells have been pro- 
duced in a murine ortholog of the described NHPs. 

Further, the present invention also relates to processes for 
identifying compounds that modulate, i.e., act as agonists or 

10 antagonists, of NHP expression and/or NHP product activity 
that utilize purified preparations of the described NHPs 
and/or NHP product, or cells expressing the same. Such 
compounds can be used as therapeutic agents for the treat- 
ment of any of a wide variety of symptoms associated with 

15 biological disorders or imbalances. 

4. DESCRIPTION OF THE SEQUENCE LISTING 
AND FIGURES 

The Sequence Listing provides the sequence of the novel 
20 human ORFs encoding the described novel human phos- 
phatase proteins. SEQ ID NO:31 describes a NHP ORF and 
flanking sequences. 

5. DETAILED DESCRIPTION OF THE 
25 INVENTION 

The NHPs, described for the first time herein, are novel 
proteins that are expressed in, inter alia, human cell lines, 
and human brain, pituitary, kidney, testis, thyroid, adrenal 

30 gland, stomach, heart, uterus, placenta, mammary gland, 
adipose, esophagus, cervix, rectum, pericardium, ovary, 
fetal kidney and gene trapped human cells. The described 
sequences were compiled from gene trapped sequences in 
conjunction with sequences available in GENBANK, and 

35 cDNAs isolated from human testis and thyroid cDNA librar- 
ies (Edge Biosystems, Gaithersburg, Md.). 

The present invention encompasses the nucleotides pre- 
sented in the Sequence Listing, host cells expressing such 
nucleotides, the expression products of such nucleotides, 

40 and: (a) nucleotides that encode mammalian homologs of 
the described genes, including the specifically described 
NHPs, and the NHP products; (b) nucleotides that encode 
one or more portions of an NHP that correspond to func- 
tional domains, and the polypeptide products specified by 

45 such nucleotide sequences, including but not limited to the 
novel regions of any active domain(s); (c) isolated nucle- 
otides that encode mutant versions, engineered or naturally 
occurring, of the described NHPs in which all or a part of at 
least one domain is deleted or altered, and the polypeptide 

50 products specified by such nucleotide sequences, including 
but not limited to soluble proteins and peptides in which all 
or a portion of the signal sequence is deleted; (d) nucleotides 
that encode chimeric fusion proteins containing all or a 
portion of a coding region of a NHP, or one of its domains 

55 (e.g., a receptor/ligand binding domain, accessory protein/ 
self-association domain, etc.) fused to another peptide or 
polypeptide; or (e) therapeutic or diagnostic derivatives of 
the described polynucleotides such as oligonucleotides, anti- 
sense polynucleotides, ribozymes, dsRNA, or gene therapy 

60 constructs comprising a sequence first disclosed in the 
Sequence Listing. As discussed above, the present invention 
includes: (a) the human DNA sequences presented in the 
Sequence Listing (and vectors comprising the same) and 
additionally contemplates any nucleotide sequence encoding 

65 a contiguous NHP open reading frame (ORF) that hybridizes 
to a complement of a DNA sequence presented in the 
Sequence Listing under highly stringent conditions, e.g., 
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hybridization to filter-bound DNA in 0.5 M NaHP0 4 , 1% NHP gene antisense molecules, useful, for example, in NHP 

sodium dodecyl sulfate (SDS), 1 mM EDTA at 65° C, and gene regulation (for and/or as antisense primers in amplifi- 

washing in 0.1xSSC/0.1% SDS at 68° C. (Ausubel F. M. et cation reactions of NHP gene nucleic acid sequences). With 

al., eds., 1989, Current Protocols in Molecular Biology, Vol. respect to NHP gene regulation, such techniques can be used 

I, Green Publishing Associates, Inc., and John Wiley & sons, 5 to regulate biological functions. Further, such sequences can 

Inc., New York, at p. 2.10.3) and encodes a functionally be used aspartof ribozyme and/or triple helix sequences that 

equivalent gene product. Additionally contemplated are any are also useful for j^p gene regu lation. 

nucleotide sequences that hybridize to the complement of r , . A . A . t . , , , , , 

the DNA sequence that encode and express an amino acid Inhibitory antisense or double stranded oligonucleotides 

sequence presented in the Sequence Listing under moder- in can additionally comprise at least one modified base moiety 

ately stringent conditions, e.g., washing in 0.2xSSC/0.1% 10 ^ lch 15 sele ^ * om the S™P m L c udm S but not 1™*^° 

SDS at 42° C. (Ausubel et al, 1989, supra), yet still encode 5-fluorouracd, 5-bromouracil, 5-chlorouracil, 5-iodouracil, 

a functionally equivalent NHP product. Functional equiva- hypoxanthine xantine 4-acetylcy tosine, 

lents of a NHP include naturally occurring NHPs present in 5-(carboxyhydroxylmetbyl) uracil, 

other species and mutant NHPs whether naturally occurring 1C 5-carboxy methylaminomethyl-2-thiouridine , 

or engineered (by site directed mutagenesis, gene shuffling, 15 5<arboxymethylaminomethyluracil, dihydrouracil, beta-D- 

directed evolution as described in, for example, U.S. Pat. galactosylqueosine, inosme, N6-isopentenyladenine, 

Nos. 5,723,323 and 5,837,458 both of which are herein 1-methylguanme, 1-metbylinosine, 2,2-dimethylguanine, 

incorporated by reference). The invention also includes 2-methylademne, 2-methylguanine, 3-methylcytosine, 

degenerate nucleic acid variants of the disclosed NHP poly- m 5-methylcytosine N6-ademne, 7-methylguanine, 

nucleotide sequences. 5-methylaminomethyluraciI, 5-methoxyammomethyl-2- 

A .1-,. n - i » j i i j- thiouracil, be ta-D -mannosy lq ueosine , 

Additionally contemplated are polynucleotides encoding ' . . M - 

Kmn hdu, »u ■ £ i • i * j j u 5 -methoxycarboxymethyluracil, 5-methoxyuracil, 

NHP ORFs, or their functional equivalents, encoded by « lL . , . ; xr , . J A J . , . ' - 3 

A , 1 *-j * t? * nn «f nrx i_ \ 2-methyltnio-N6-isopentenylademne, uracil -5 -oxy acetic 

polynucleotide sequences that are about 99, 95, 90, or about • j / \ u * • j m ■ 

or ♦ • -i * j- • c acid (v), wybutoxosine, pseudouracil, queosine, 

85 percent similar to corresponding regions of a sequence - , u . v 7 . J - . , ' \, . 1 ' 

presented in the Sequeace Luting (as measured bypLAST tS^^J^^A^'Sk 

sequence comparison analysis using, for example, the GCG 1 , ' uiuuyiuuwi maw j uAjraiciu. ™,ui 

„ , . i ■ j V i* / \ methylester, uracil-5-oxyacetic acid (v), 5-methyl-2- 

sequence analysis package using default parameters). ... J . - ' ' . v (' / _ 

. . i ■ i j i • • j . , c thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3) 

m C nMA 0D i ^ mCX f?l i UC i C1C aCld *° !ecul ' S ' P ref " w, and 2,6-diaminopurine. 
erably DNA molecules, that hybridize to, and are therefore 30 ^ , , . 

thecomplementsof,thedescribedNHPencodingpolynucle- ^ oligonucleotide can also comprise at least 

otides. Such hybridization conditions can be highly stringent one modified sugar moiety selected from the group includ- 
or less highly stringent, as described above: In instances m Z but DOt hnuted t0 2-fluoroarabmose, xylulose, 

where the nucleic acid molecules are deoxyoligonucleotides nexose. 

("DNA oligos"), such molecules are generally about 16 to 35 In y et another embodiment, the antisense oligonucleotide 
about 100 bases long, or about 20 to about 80, or about 34 wil1 comprise at least one modified phosphate backbone 
to about 45 bases long, or any variation or combination of selected from the group consisting of a phosphorothioate, a 
sizes represented therein that incorporate a contiguous phosphorodithioate, a phosphoramidothioate, a 
region of sequence first disclosed in the Sequence Listing. phosphoramidate, a phosphordiamidate, a 
Such oligonucleotides can be used in conjunction with the 40 methylphosphonate, an alkyl phosphotriester, and a formac- 
polymerase chain reaction (PCR) to screen libraries, isolate elal or analog thereof. 

clones, and prepare cloning and sequencing templates, etc. In yet another embodiment, the antisense oligonucleotide 

Alternatively, such NHP oligonucleotides can be used as k 40 ct-anomeric oligonucleotide. An ct-anomeric oligo- 
hybridization probes for screening libraries, and assessing nucleotide forms specific double -stranded hybrids with 
gene expression patterns (particularly using a micro array or 45 complementary RNAin which, contrary to the usual (3-units, 
high-throughput "chip" format), Additionally, a series of the the strands run parallel to each other (Gautier et al., 1987, 
described NHP oligonucleotide sequences, or the comple- Nucl. Acids Res. 15:6625^-6641). The oligonucleotide is a 
ments thereof, can be used to represent all or a portion of the 2'-0-methylribonucleotide (Inoue et al., 1987, Nucl. Acids 
described NHP sequences. The oligonucleotides, typically Res - 15:6131-6148), or a chimeric RNA-DNA analogue 
between about 16 to about 40 (or any whole number within 50 ( Inoue el al., 1987, FEBS Lett. 215:327-330). Alternatively, 
the stated range) nucleotides in length may partially overlap double stranded RNA can be used to disrupt the expression 
each other and/or the NHP sequence may be represented and function of a targeted NHP. 

using oligonucleotides that do not overlap. Accordingly, the Oligonucleotides of the invention can be synthesized by 
described NHP polynucleotide sequences shall typically standard methods known in the art, e.g. by use of an 
comprise at least about two or three distinct oligonucleotide 55 automated DNA synthesizer (such as are commercially 
sequences of at least about 18, and preferably about 25, available from Biosearch, Applied Biosystems, etc.). As 
nucleotides in length that are each first disclosed in the examples, phosphorothioate oligonucleotides can be synthe- 
described Sequence Listing. Such oligonucleotide sized by the method of Stein et al. (1988, Nucl. Acids Res. 
sequences may begin at any nucleotide present within a 16:3209), and methylphosphonate oligonucleotides can be 
sequence in the Sequence Listing and proceed in either a 60 prepared by use of controlled pore glass polymer supports 
sense (5'-to-3') orientation vis-a-vis the described sequence (Sarin et al., 1988, Proc. Natl. Acad. Sci. U.S.A. 
or in an antisense orientation. 85:7448-7451), etc. 

For oligonucleotide probes, highly stringent conditions Low stringency conditions are well known to those of 
may refer, e.g., to washing in 6xSSC/0.05% sodium pyro- skill in the art, and will vary predictably depending on the 
phosphate at 37° C. (for 14-base oligos), 48° C. (for 17-base 65 specific organisms from which the library and the labeled 
oligos), 55° C, (for 20-base oligos), and 60° C. (for 23-base sequences are derived. For guidance regarding such condi- 
oligos). These nucleic acid molecules may encode or act as tions see, for example, Sambrook et al., 1989, Molecular 
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Cloning, A Laboratory Manual (and periodic updates 
thereof), Cold Springs Harbor Press, N.Y; and Ausubel et 
al., 1989, Current Protocols in Molecular Biology, Green 
Publishing Associates and Wiley Interscience, N.Y. 

Alternatively, suitably labeled NHP nucleotide probes can 
be used to screen a human genomic library using appropri- 
ately stringent conditions or by PCR. The identification and 
characterization of human genomic clones is helpful for 
identifying polymorphisms (including, but not limited to, 
nucleotide repeats, microsatellite alleles, single nucleotide 
polymorphisms, or coding single nucleotide 
polymorphisms), determining the genomic structure of a 
given locus/allele, and designing diagnostic tests. For 
example, sequences derived from regions adjacent to the 
intron/exon boundaries of the human gene can be used to 
design primers for use in amplification assays to detect 
mutations within the exons, introns, splice sites (e.g., splice 
acceptor and/or donor sites), etc., that can be used in 
diagnostics and pharmacogenetics. 

Further, a NHP gene homolog can be isolated from 
nucleic acid from an organism of interest by performing 
PCR using two degenerate or "wobble" oligonucleotide 
primer pools designed on the basis of amino acid sequences 
within the NHP products disclosed herein. The template for 
the reaction may be total RNA, mRNA, and/or cDNA 
obtained by reverse transcription of mRNA prepared from, 
for example, human or non-human cell lines or tissue known 
or suspected to express an allele of a NHP gene. 

The PCR product can be subcloned and sequenced to 
ensure that the amplified sequences represent the sequence 
of the desired NHP gene. The PCR fragment can then be 
used to isolate a full length cDNA clone by a variety of 
methods. For example, the amplified fragment can be 
labeled and used to screen a cDNA library, such as a 
bacteriophage cDNA library. Alternatively, the labeled frag- 
ment can be used to isolate genomic clones via the screening 
of a genomic library. 

PCR technology can also be used to isolate full length 
cDNA sequences. For example, RNA can be isolated, fol- 
lowing standard procedures, from an appropriate cellular or 
tissue source (i.e., one known, or suspected, to express a 
NHP gene). A reverse transcription (RT) reaction can be 
performed on the RNA using an oligonucleotide primer 
specific for the most 5' end of the amplified fragment for the 
priming of first strand synthesis. The resulting RNA/DNA 
hybrid may then be "tailed" using a standard terminal 
transferase reaction, the hybrid may be digested with RNase 
H, and second strand synthesis may then be primed with a 
complementary primer. Thus, cDNA sequences upstream of 
the amplified fragment can be isolated. For a review of 
cloning strategies that can be used, see e.g., Sambrook et al., 
1989, supra. 

AcDNA encoding a mutant NHP gene can be isolated, for 
example, by using PCR. In this case, the first cDNA strand 
may be synthesized by hybridizing an oligo-dT oligonucle- 
otide to mRNA isolated from tissue known or suspected to 
be expressed in an individual putatively carrying a mutant 
NHP allele, and by extending the new strand with reverse 
transcriptase. The second strand of the cDNA is then syn- 
thesized using an oligonucleotide that hybridizes specifi- 
cally to the 5* end of the normal gene. Using these two 
primers, the product is then amplified via PCR, optionally 
cloned into a suitable vector, and subjected to DNA 
sequence analysis through methods well known to those of 
skill in the art. By comparing the DNA sequence of the 
mutant NHP allele to that of a corresponding normal NHP 
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allele, the mutation(s) responsible for the loss or alteration 
of function of the mutant NHP gene product can be ascer- 
tained. 

Alternatively, a genomic library can be constructed using 
5 DNA obtained from an individual suspected of or known to 
. carry a mutant NHP allele (e.g., a person manifesting a 
NHP-associated phenotype such as, for example, immune 
disorders, obesity, high blood pressure, etc.), or a cDNA 
library can be constructed using RNA from a tissue known, 
10 or suspected, to express a mutant NHP allele. A normal NHP 
gene, or any suitable fragment thereof, can then be labeled 
and used as a probe to identify the corresponding mutant 
NHP allele in such libraries. Clones containing mutant NHP 
gene sequences can then be purified and subjected to 
15 sequence analysis according to methods well known to those 
skilled in the art. 

Additionally, an expression library can be constructed 
utilizing cDNA synthesized from, for example, RNA iso- 
lated from a tissue known, or suspected, to express a mutant 

20 NHP allele in an individual suspected of or known to carry 
such a mutant allele. In this manner, gene products made by 
the putatively mutant tissue may be expressed and screened 
using standard antibody screening techniques in conjunction 
with antibodies raised against a normal NHP product, as 

25 described below. (For screening techniques, see, for 
example, Harlow, E. and Lane, eds., 1988, "Antibodies: A 
Laboratory Manual", Cold Spring Harbor Press, Cold Spring 
Harbor.) Additionally, screening can be accomplished by 
screening with labeled NHP fusion proteins, such as, for 

30 example, AP-NHP or NHP-AP fusion proteins. In cases 
where a NHP mutation results in an expressed gene product 
with altered function (e.g., as a result of a missense or a 
frameshift mutation), polyclonal antibodies to a NHP are 
likely to cross-react with a corresponding mutant NHP gene 

35 product. Library clones detected via their reaction with such 
labeled antibodies can be purified and subjected to sequence 
analysis according to methods well known in the art. 

An additional application of the described novel human 
polynucleotide sequences is their use in the molecular 

40 mutagenesis/evolution of proteins that are at least partially 
encoded by the described novel sequences using, for 
example, polynucleotide shuffling or related methodologies. 
Such approaches are described in U.S. Pat. Nos. 5,830,721 
and 5,837,458 which are herein incorporated by reference in 

45 their entirety. 

The invention also encompasses (a) DNA vectors that 
contain any of the foregoing NHP coding sequences and/or 
their complements (i.e., antisense); (b) DNA expression 
vectors that contain any of the foregoing NHP coding 

50 sequences operatively associated with a regulatory element 
that directs the expression of the coding sequences (for 
example, baculo virus as described in U.S. Pat. No. 5,869, 
336 herein incorporated by reference); (c) genetically engi- 
neered host cells that contain any of the foregoing NHP 

55 coding sequences operatively associated with a regulatory 
element that directs the expression of the coding sequences 
in the host cell; and (d) genetically engineered host cells that 
express an endogenous NHP gene under the control of an 
exogenously introduced regulatory element (i.e., gene 

60 activation). As used herein, regulatory elements include but 
are not limited to inducible and non-inducible promoters, 
enhancers, operators and other elements known to those 
skilled in the art that drive and regulate expression. Such 
regulatory elements include but are not limited to the 

65 cytomegalovirus hCMV immediate early gene, regulatable, 
viral (particularly retroviral LTR promoters) the early or late 
promoters of SV40 adenovirus, the lac system, the trp 
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system, the TAC system, the TRC system, the major opera- encompasses pharmaceutical formulations and methods for 

tor and promoter regions of phage lambda, the control treating biological disorders. 

regions of fd coat protein, the promoter for Various aspects of the invention are described in greater 

3-phosphoglycerate kinase (PGK), the promoters of acid detail m the subsections below, 

phosphatase, and the promoters of the yeast a-mating fac- 5 

tors, 5.1 THE NHP SEQUENCES 

. Where, as in the present instance, some of the described ,-«_ nxr a j j • j j j • 

xmn l-T 1 .'j *u u** u - i The cDNA sequences and corresponding deduced amino 

NHP peptides or polypeptides are thought to be cytoplasmic - . , r * L j *l j vttt™ > j • *». 

. F r " ;r f . & . j «t. . j acid sequences of the described NHPs are presented m the 

proteins, expression systems can be engineered that produce _ n . __ , 4 .j 

soluble derivatives of a NHP (corresponding to a NHP M ^ting. The NHP nucleotide sequences were 

extracellular and/or intracellular domains, or truncated obtained usmg the sequence mformation pre^nt ui human 

polypeptides lacking one or more hydrophobic domains) | ene ,ra PP ed »»> other cDNA sequences, 

and/or NHP fusion protein products (especially NHP-Ig f^^LA" 1 *'!? haS pf T -!, Vlde ° C , e the 

fusion proteins, i.e., Lions of a NHP domain to an IgFc), described ^ 06 m a wide variety of human 

NHP antibodies, and anti-idiotypic antibodies (including 15 « « ^ » gene tap^ I^m^ to to 

Fab fragments) that can be used in therapeutic applications. " phosphatases, the described NHPs also share sig- 

Preferably, the above expression systems are engineered to mfi " nt B ™ ,mt y t0 a ran 8 e ° f , addlt « onal ^ su Pf r 

allow the desired peptide or polypeptide to be recovered PJ™. 1 " 5 * om . a ran S e of P^ a . and s P^, es - Giv f < he 

from the culture media physiological importance of protein phosphatases and other 

_ • ' , « » « proteins that display structural relatedness to the described 

lie : present invention also encompasses antibodies ana 20 '^Ps, such proteins have been subject to intense scrutiny as 

anti-idiotypic antibodies (mduding Fab fragments), antago- Med ^ discussed m u s p at Nos 5 939,271 and 

mste and agonists of a NHP as well as compounds or ^ which descrfbe a ^ of uses and Hcations 

nucleotide constructs that inhibit expression of a NHP gene ./ ' u « . , . _ . . r, , KTTjn rr , 

, . . . « ., - • j-l that can be applied to the described NHP sequences and 

(transcription factor inhibitors, antisense and nbozyme ... l ■ „ * j t_ c • *u • *• ♦ 

v . . r . ' . J t which are herein incorporated by reference m their entirety, 

molecules, or gene or regulatory sequence replacement 2 5 , . . 

constructs), or promote the expression of a NHP (e.g., . Sewt ^ polymorphisms were identified during sequenc- 

expression constructs in which NHP coding sequences are m 6 such 45 an A " C transversion that can occur in the 

operatively associated with expression control elements sequence re gio in represented by, for example, nucleotide 

such as promoters, promoter/enhancers, etc.). P°? ltl0n 76 of . SE ? ID N0:1 ***** <*? resu " aaLotM 

The NHPs or NHP peptides, NHP fusion proteins, NHP 30 preS f ent m *^ ro ™ S P 0 f nd ' n 8 a ™° a ? d se H queD< ? al 

, , **u -r * * * a * * position, for example, 26 of SEQ ID NO:2, and an A-G 

nucleotide sequences, antibodies, antagonists and agonists f . . ' , A r ' . . . . 

can be useful for the detection of mutant NHPs or inappro- ^ OD that , can oc f ur ,?, n ,he ^^ff^^ 

priately expressed NHPs for the diagnosis of disease The example nuckoude position 706 of SEQ ID NO:l 

Kmn * • xnm c • * • xTttrt i which can result m a T or A being present m the correspond - 

NHP proteins or peptides, NHP fusion pro terns, NHP nucle- . . ♦ <? i 

4 .. r , . *^ , # •» j« mg amino aad sequence at, for example, position 236 of 

otide sequences, host cell expression systems, antibodies, 35 e ™ irx xt^ An, * • *• * 1 * 

" # - , , *■ « • a SEQ ID NO:2. The present invention contemplates 

antagonists, agonists and genetically engineered cells and . P ^ • « . . . r 

animals can be used for screening for drugs (or high sw '" ence ? ^orporatmg any f the above polymorphisms as 

, t . - t_- * • 1 vu • \ *r *• * well as all combinations and permutations thereof, 

throughput screenmg of combmatonal libraries) effective in . r 

the treatment of the symptomatic or phenotypic manifesta- ^ g ene encodmg the described NHPs is apparently 

tions of perturbing the normal function of a NHP in the body. 40 P rcsenl on humaa chromosome 15 or human chromosome 3 

The use of engineered host cells and/or animals can offer an ( see GENBANK accession nos. AC012378 and AC012674). 

advantage in that such systems allow not only for the Accordingly, the described sequences are useful for identi- 

identification of compounds that bind to the endogenous fying and mapping the coding regions of the human genome 

receptor/ligand of a NHP, but can also identify compounds as weU as identifying biologically vahdating functional exon 

that trigger NHP-mediated activities or pathways. 45 splice j UDCtl0ns - 

Finally, the NHP products can be used as therapeutics. For 5 2 NHPS AND NHP POLYPEPTIDES 
example, soluble derivatives such as NHP peptides/domains 

corresponding to NHPs, NHP fusion protein products ^ described NHP products, polypeptides, peptide 

(estxciaUyNHP-Igfiisionproteirjs,i.e.,fiisionsofaI>m^ fragments, mutated, truncated, or deleted forms of the 

a domain of a NHP, to an IgFc), NHP antibodies and 50 NHPs. and /° r ™ ? ^ ion proteins can be prepared for a 

anti-idiotypic antibodies (including Fab fragments), antago- variet y of ««s. including but not limited to the generation of 

nists or agonists (including compounds that modulate or act antibodies, as reagents in diagnostic assays (e.g., for cancer, 

on downstream targets in a NHP-mediated pathway) can be neuronal abnormalities, Barbet-Biel Syndrome, etc.), the 

used to directly treat diseases or disorders. For instance, the identification of other cellular gene products related to the 

administration of an effective amount of soluble NHP, or a 55 NHR & reagents in assays for screening for compounds that 

NHP-IgFc fusion protein or an anti-idiotypic antibody (or its can be used as pharmaceutical reagents useful in the thera- 

Fab) that mimics the NHP could activate or effectively peutic treatment of mental, biological, or medical disorders 

antagonize the endogenous NHP or a protein interactive aQ d disease. 

therewith. Nucleotide constructs encoding such NHP prod- The Sequence Listing discloses the amino acid sequence 

ucts can be used to genetically engineer host cells to express 60 encoded by the described NHP-encoding polynucleotides, 

such products in vivo; these genetically engineered cells The NHPs have initiator methionines in DNA sequence 

function as "bioreactors" in the body delivering a continuous contexts consistent with eucaryotic translation initiation site, 

supply of a NHP, a NHP peptide, or a NHP fusion protein to and display an apparent signal sequence near the N-terminus 

the body. Nucleotide constructs encoding functional NHPs, which indicates that the NHPs can be membrane associated, 

mutant NHPs, as well as antisense and ribozyme molecules 65 secreted, or cytoplasmic. 

can also be used in "gene therapy" approaches for the The NHP amino acid sequences of the invention include 

modulation of NHP expression. Thus, the invention also the amino acid sequences presented in the Sequence Listing 
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as well as analogues and derivatives thereof. Further, cor- recombinant vims expression vectors (e.g., baculovirus) 
responding NHP homologues from other species are encom- containing NHP sequences; plant cell systems infected with 
passed by the invention. In fact, any NHP protein encoded recombinant virus expression vectors (e.g., cauliflower 
by the NHP nucleotide sequences described above are within mosaic virus, CaMV; tobacco mosaic virus, TM V) or trans- 
the scope of the invention, as are any novel polynucleotide 5 formed with recombinant plasmid expression vectors (e.g., 
sequences encoding all or any novel portion of an amino Ti plasmid) containing NHP nucleotide sequences; or mam- 
acid sequence presented in the Sequence Listing. The degen- malian cell systems (e.g., COS, CHO, BHK, 293, 3T3) 
erate nature of the genetic code is well known, and, harboring recombinant expression constructs containing 
accordingly, each amino acid presented in the Sequence promoters derived from the genome of mammalian cells 
Listing, is generically representative of the well known 10 (e.g., metallothionein promoter) or from mammalian viruses 
nucleic acid "triplet" codon, or in many cases codons, that (e.g., the adenovirus late promoter; the vaccinia virus 7.5K 
can encode the amino acid. As such, as contemplated herein, promoter). 

the amino acid sequences presented in the Sequence Listing, In bacterial systems, a number of expression vectors may 

when taken together with the genetic code (see, for example, be advantageously selected depending upon the use intended 
Table 4-1 at page 109 of "Molecular Cell Biology", 1986, J. 15 for the NHP product being expressed. For example, when a 

Darnell et al. eds., Scientific American Books, New York, large quantity of such a protein is to be produced for the 

N.Y., herein incorporated by reference) are generically rep- generation of pharmaceutical compositions of or containing 

resentative of all the various permutations and combinations NHP, or for raising antibodies to a NHP, vectors that direct 

of nucleic acid sequences that can encode such amino acid the expression of high levels of fusion protein products that 
sequences. 20 are readily purified may be desirable. Such vectors include, 

The invention also encompasses proteins that are func- but are not limited, to the E. coli expression vector pUR278 
tionally equivalent to the NHPs encoded by the presently (Ruther et al., 1983, EMBO J. 2:1791), in which a NHP 
described nucleotide sequences as judged by any of a coding sequence may be ligated individually into the vector 
number of criteria, including, but not limited to, the ability in frame with the lacZ coding region so that a fusion protein 
to bind and modify a NHP substrate, or the ability to effect 2 s ^ produced; pIN vectors (Inouye & Inouye, 1985, Nucleic 
an identical or complementary downstream pathway, or a Acids Res. 13:3101-3109; Van Heeke & Schuster, 1989, J. 
change in cellular metabolism (e.g., proteolytic activity, ion Biol. Chem. 264:5503-5509); and the like. pGEX vectors 
flux, tyrosine phosphorylation, etc.). Such functionally may also be used to express foreign polypeptides as fusion 
equivalent NHP proteins include, but are not limited to, proteins with glutathione S-transferase (GST). In general, 
additions or substitutions of amino acid residues within the 30 such fusion proteins are soluble and can easily be purified 
amino acid sequence encoded by a NHP nucleotide sequence from lysed cells by adsorption to glutathione-agarose beads 
described above, but which result in a silent change, thus followed by elution in the presence of free glutathione. The 
producing a functionally equivalent gene product. Amino PGEX vectors are designed to include thrombin or factor Xa 
acid substitutions may be made on the basis of similarity in protease cleavage sites so that the cloned target gene product 
polarity, charge, solubility, hydrophobicity, hydrophilicity, 35 can be released from the GST moiety, 
and/or the amphipathic nature of the residues involved. For In an insect system, A utographa calif omica nuclear poly- 
example, nonpolar (hydrophobic) amino acids include hidrosis virus (AcNPV) is used as a vector to express foreign 
alanine, leucine, isoleucine, valine, proline, phenylalanine, genes. The virus grows in Spodoptera frugiperda cells. A 
tryptophan, and methionine; polar neutral amino acids NHP encoding polynucleotide sequence can be cloned indi- 
include glycine, serine, threonine, cysteine, tyrosine, 40 vidually into non-essential regions (for example the poly- 
asparagine, and glutamine; positively charged (basic) amino hedrin gene) of the virus and placed under control of an 
acids include arginine, lysine, and histidine; and negatively AcNPV promoter (for example the polyhedrin promoter), 
charged (acidic) amino acids include aspartic acid and Successful insertion of NHP gene coding sequence will 
glutamic acid. result in inactivation of the polyhedrin gene and production 

A variety of host-expression vector systems can be used 45 of non-occluded recombinant virus (i.e., virus lacking the 

to express the NHP nucleotide sequences of the invention. proteinaceous coat coded for by the polyhedrin gene). These 

Where the NHP peptide or polypeptide can exist, or has been recombinant viruses are then used to infect Spodoptera 

engineered to exist, as a soluble or secreted molecule, the frugiperda cells in which the inserted gene is expressed (e.g., 

soluble NHP peptide or polypeptide can be recovered from see Smith et al., 1983, J. Virol. 46: 584; Smith, U.S. Pat. No. 

the culture media. Such expression systems also encompass 50 4,215,051). 

engineered host cells that express a NHP, or functional In mammalian host cells, a number of viral-based expres- 
equivalent, in situ. Purification or enrichment of a NHP from . sion systems may be utilized. In cases where an adenovirus 
such expression systems can be accomplished using appro- is used as an expression vector, the NHP nucleotide 
priate detergents and lipid micelles and methods well known sequence of interest may be ligated to an adenovirus 
to those skilled in the art. However, such engineered host 55 transcription/translation control complex, e.g., the late pro- 
cells themselves may be used in situations where it is moter and tripartite leader sequence. This chimeric gene can 
important not only to retain the structural and functional then be inserted in the adenovirus genome by in vitro or in 
characteristics of the NHP, but to assess biological activity, vivo recombination. Insertion in a non-essential region of 
e.g., in drug screening assays. the viral genome (e.g., region El or E3) will result in a 
The expression systems that may be used for purposes of 60 recombinant virus that is viable and capable of expressing a 
the invention include but are not limited to microorganisms NHP product in infected hosts (e.g., See Logan & Shenk, 
such as bacteria (e.g., £. coli, B. subtilis) transformed with 1984, Proc. Natl. Acad. Sci. USA 81:3655-3659). Specific 
recombinant bacteriophage DNA, plasmid DNA or cosmid initiation signals may also be required for efficient transla- 
DNA expression vectors containing NHP nucleotide tion of inserted NHP nucleotide sequences. These signals 
sequences; yeast (e.g., Saccharomyces, Pichia) transformed 65 include the ATG initiation codon and adjacent sequences. In 
with recombinant yeast expression vectors containing NHP cases where an entire NHP gene or cDNA, including its own 
nucleotide sequences; insect cell systems infected with initiation codon and adjacent sequences, is inserted into the 
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appropriate expression vector, no additional translational Alternatively, any fusion protein can be readily purified 

control signals may be needed. However, in cases where by utilizing an antibody specific for the fusion protein being 

only a portion of a NHP coding sequence is inserted, expressed. For example, a system described by Janknecht et 

exogenous translational control signals, including, perhaps, a i, allows for the ready purification of non-denatured fusion 

the ATG initiation codon, must be provided. Furthermore, 5 proteins expressed in human cell lines (Janknecht, et al., 

the initiation codon must be in phase with the reading frame 199^ p roc> Natl Acad. Sci. USA 88:8972-8976). In this 

of the desired coding sequence to ensure translation of the system, the gene of interest is subcloned into a vaccinia 

entire insert/These exogenous translational control signals recombination plasmid such that the gene's open reading 

and initiation codons can be of a variety of origins, both frame ^ translationally fused to an amino-terminal tag 

natural and synthetic. The efficiency of expression may be 10 consisting of six histidine residues. Extracts from cells 

enhanced by the inclusion of appropriate transcription infected with recombinant vaccinia vims are loaded onto 

enhancer elements, transcription terminators, etc. (See Bitter ^i 2+ nitriloacetic acid-agarose columns and histidine-tagged 

et al., 1987, Methods in Enzymol. 153:516-544). proteins are selectively eluted with imidazole-containing 

In addition, a host cell strain may be chosen that modu- buffers, 
lates the expression of the inserted sequences, or modifies 15 

and processes the gene product in the specific fashion 5.3 Antibodies to NHP Products 

desired. Such modifications (e.g., glycosylation) and pro- « 

cessing (eg., cleavage) of protein products may be impor- Antibodies that specifically recognize one or more 
tant for the function of the protein. Different host cells have epitopes of a . " H *> or epitopes of conserved variants of a 
characteristic and specific mechanisms for the post- 20 ^ or P e P tlde fronts o a NHP are also encompassed 
translational processing and modification of proteins and by the myenUon. Such antibodies include but are no limited 
gene products Appropriate cell lines or host systems can be [° Polyclonal antibodies, monoclonal antibodies (mAbs), 
chosen to ensure the correct modification and processing of h^anized or chmienc antibodies single chain antibodies, 
the foreign protein expressed. To this end, eukaryotic host ™> fragments, F(ab') 2 fragments, fragments produced by a 
cells which possess the cellular machinery for proper pro- 25 Fa * expression library, anh-idiotypic (anti-Id) antibodies, 
cessing of the primary transcript, glycosylation, and phos- and epitope-binding fragments of any of the above, 
phorylation of the gene product may be used. Such mam- The antibodies of the invention can be used, for example, 
malian host cells include, but are not limited to, CHO, in the detection of NHP in a biological sample and may, 
VERO, BHK, HeLa, COS, MDCK, 293, 3T3, WI38, and in therefore, be utilized as part of a diagnostic or prognostic 
particular, human cell lines. 30 technique whereby patients may be tested for abnormal 
For long-term, high-yield production of recombinant amounts of NHP. Such antibodies may also be utilized in 
proteins, stable expression is preferred. For example, cell conjunction with, for example, compound screening 
lines that stably express the NHP sequences described above schemes for the evaluation of the effect of test compounds 
can be engineered. Rather than using expression vectors on expression and/or activity of a NHP gene product, 
which contain viral origins of replication, host cells can be 35 Additionally, such antibodies can be used in conjunction 
transformed with DNA controlled by appropriate expression &™ thera Py to > for example, evaluate the normal and/or 
control elements (e.g., promoter, enhancer sequences, tran- engineered NHP-expressing cells prior to their introduction 
scription terminators, polyadenylation sites, etc.), and a mto the patient. Such antibodies may additionally be used as 
selectable marker. Following the introduction of the foreign a ™ iho * for ^ inhibition of abnormal NHP activity. Thus, 
DNA, engineered cells may be allowed to grow for 1-2 days 40 such antibodies may, therefore, be utilized as part of treat- 
in an enriched media, and then are switched to a selective m ent methods. 

media. The selectable marker in the recombinant plasmid For the production of antibodies, various host animals 

confers resistance to the selection and allows cells to stably may be immunized by injection with the NHP, a NHP 

integrate the plasmid into their chromosomes and grow to peptide (e.g., one corresponding to a functional domain of a 

form foci which in turn can be cloned and expanded into cell 45 NHP), truncated NHP polypeptides (NHP in which one or 

lines. This method may advantageously be used to engineer more domains have been deleted), functional equivalents of 

cell lines which express the NHP product. Such engineered the NHP or mutated variant of the NHP. Such host animals 

cell lines may be particularly useful in screening and evalu- may include but are not limited to pigs, rabbits, mice, goats, 

ation of compounds that affect the endogenous activity of the and rats, to name but a few. Various adjuvants may be used 

NHP product. 50 to increase the immunological response, depending on the 

A number of selection systems can be used, including but . host species, including but not limited to Freund's (complete 

not limited to the herpes simplex virus thymidine kinase and incomplete), mineral gels such as aluminum hydroxide, 

(Wigler, et al, 1977, Cell 11:223), hypoxanthine-guanine surface active substances such as lysolecithin, pluronic 

phosphoribosyltransferase (Szybalska & Szybalski, 1962, polyols, polyanions, peptides, oil emulsions, keyhole limpet 

Proc. Natl. Acad. Sci. USA 48:2026), and adenine phospho- 55 hemocyanin, dinitrophenol, and potentially useful human 

ribosyltransferase (Lowy, et al., 1980, Cell 22:817) genes adjuvants such as BCG (bacille Calmette-Guerin) and 

can be employed in tk~, hgprf or aprt" cells, respectively. Corynebacteriumparvum. Polyclonal antibodies are hetero- 

Also, antimetabolite resistance can be used as the basis of geneous populations of antibody molecules derived from the 

selection for the following genes: dhfr, which confers resis- sera of the immunized animals. 

tance to methotrexate (Wigler, et al., 1980, Nad. Acad. Sci. 60 Monoclonal antibodies, which are homogeneous popula- 

USA 77:3567; O 'Hare, et al., 1981, Proc. Natl. Acad. Sci. tions of antibodies to a particular antigen, can be obtained by 

USA 78:1527); gpt, which confers resistance to mycophe- any technique which provides for the production of antibody 

nolic acid (Mulligan & Berg, 1981, Proc. Nad. Acad. Sci. molecules by continuous cell lines in culture. These include, 

USA 78:2072); neo, which confers resistance to the ami- but are not limited to, the hybridoma technique of Kohler 

noglycoside G-418 (Cblberre-Garapin, et al., 1981, J. Mol. 65 and Milstein, (1975, Nature 256:495-497; and U.S. Pat. No. 

Biol. 150:1); and hygro, which confers resistance to hygro- 4,376,110), the human B-cell hybridoma technique (Kosbor 

mycin (Santerre, et al., 1984, Gene 30:147). et al., 1983, Immunology Today 4:72; Cole et al, 1983, 
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Proc. Natl. Acad. Sci. USA 80:2026-2030), and the EBV- 
hybridoma technique (Cole et al., 1985, Monoclonal Anti- 
bodies And Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). 
Such antibodies may be of any immunoglobulin class 
including IgG, IgM, IgE, IgA, IgD and any subclass thereof. 5 
The hybridoma producing the mAb of this invention may be 
cultivated in vitro or . in vivo. Production of high titers of ; 
mAbs in vivo makes this the presently preferred method of 
production. 

In addition, techniques developed for the production of 10 
"chimeric antibodies" (Morrison et al., 1984, Proc. Natl. 
Acad. Sci., 81:6851-6855; Neuberger et al, 1984, Nature, 
312:604-608; Takeda et al., 1985, Nature, 314:452-454) by 
splicing the genes from a mouse antibody molecule of 
appropriate antigen specificity together with genes from a 35 
human antibody molecule of appropriate biological activity 
can be used. A chimeric antibody is a molecule in which 
different portions are derived from different animal species, 
such as those having a variable region derived from a murine 
mAb and a human immunoglobulin constant region. 20 

Alternatively, techniques described for the production of 
single chain antibodies (U.S. Pat. No. 4,946,778; Bird, 1988, 
Science 242:423-426; Huston et al., 1988, Proc. Natl. Acad. 
Sci. USA 85:5879-5883; and Ward et al., 1989, Nature 

15 

341:544- 546) can be adapted to produce single chain 
antibodies against NHP gene products. Single chain anti- 
bodies are formed by linking the heavy and light chain 
fragments of the Fv region via an amino acid bridge, 
resulting in a single chain polypeptide. 

Antibody fragments which recognize specific epitopes 
may be generated by known techniques. For example, such 
fragments include, but are not limited to: the F(ab') 2 frag- 



30 



ments which can be produced by pepsin digestion of the 
antibody molecule and the Fab fragments which can be 
generated by reducing the disulfide bridges of the F(ab') 2 
fragments. Alternatively, Fab expression libraries may be 
constructed (Huse et al., 1989, Science, 246:1275-1281) to 
allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity. 

Antibodies to a NHP can, in turn, be utilized to generate 
anti-idiotype antibodies that "mimic" a given NHP, using 
techniques well known to those skilled in the art. (See, e.g., 
Greenspan & Bona, 1993, FASEB J 7(5):437-444; and 
Nissinoff, 1991, J. Immunol. 147(8):2429-2438). For 
example antibodies which bind to a NHP domain and 
competitively inhibit the binding of NHP to its cognate 
receptor/ligand can be used to generate anti-idiotypes that 
"mimic" the NHP and, therefore, bind, activate, or neutralize 
a NHP, NHP receptor, or NHP ligand. Such anti-idiotypic 
antibodies or Fab fragments of such anti-idiotypes can be 
used in therapeutic regimens involving a NHP mediated 
pathway. 

The present invention is not to be limited in scope by the 
specific embodiments described herein, which are intended 
as single illustrations of individual aspects of the invention, 
and functionally equivalent methods and components are 
within the scope of the invention. Indeed, various modifi- 
cations of the invention, in addition to those shown and 
described herein will become apparent to those skilled in the 
art from the foregoing description. Such modifications are 
intended to fall within the scope of the appended claims. All 
cited publications, patents, and patent applications are herein 
incorporated by reference in their entirety. 



SEQUENCE LISTING 

<160> NUMBER OF SEQ ID NOS: 31 

<210> SEQ ID NO 1 

<211> LENGTH: 3210 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 1 



atggcgcctc 


ctctgcgacc 


cctcgcccgg ctgcgaccgc 


cggggatgct 


gctccgcgcg 


60 


ctcctgctcc 


tgctgmtgct 


cagtcctttg ccaggagtgt 


ggtgctttag 


cgaactgtct 


120 


tttgtaaaag 


aaccacagga 


tgtaactgtc acaagaaagg 


acccagtcgt 


tttagattgc 


180 


caggctcacg. 


gagaagtt.cc 


tattaaggtc acatggttga 


aaaatggagc 


aaaaatgtct 


240 


gaaaataaac 


ggatcgaggt 


tctttctaac ggctctttat 


acatcagtga 


ggtggaaggC N 


300 


aggcgaggag 


agcagtccga 


tgaaggattt tatcagtgct 


tggcaatgaa 


caaatatgga 


360 


gccattctta 


gtcaaaaagc 


tcatcttgcc ttatcaacta 


tttctgcatt 


tgaagtccag 


420 


ccaatttcca 


ctgaggtcca 


cgaaggtgga gttgctcgat 


ttgcatgcaa 


gatttcatcc 


480 


caccctcctg 


cagtcataac 


atgggagttc aatcggacaa 


ctctacctat 


gactatggac 


540 


aggataactg 


ccctaccaac 


aggagtattg cagatctatg 


atgtcagcca 


aagggattct 


600 


ggaaattatc 


gttgtattgc 


tgccactgta gcccaccgac 


gtaaaagtat 


ggaggcctcg 


660 


ctaactgtga 


ttccagctaa 


ggagtcaaaa tccttccaca 


caccarcaat 


tatagcaggt 


720 


ccacagaaca 


taacaacatc 


tcttcatcag actgtagttt 


tggaatgcat 


ggccacagga 


780 
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-continued 



aatcccaaac caatcatttc ttggagccgc cttgatcaca aatccattga tgtctttaat 



S40 



. actcgggtac ttggaaatgg taatctcatg atatctgatg tcaggctaca -acatgctgga 900 



ttaactgtat tagctcctcc ttcatttgtt gaatggccag aaagtttaac aaggcctcga 1020 

gctggcactg ctcgatttgt gtgtcaggca gaaggaatcc cctctcccaa gatgtcatgg 1080 

ttgaaaaatg gaaggaagat acattcgaat ggtagaatta aaatgtacaa cagtaaattg 1140 

gtaattaacc agattattcc tgaagatgat gctatttatc agtgcatggc tgagaatagc 1200 

caaggatcta ttttatctag agccagactg actgtagtga tgtcagasga cagacccagt 1260 

gctccctata atgtacatgc tgaaaccatg tcaagctcag ccattctttt agcctgggag 1320 

aggccacttt ataattcaga caaagtcatt gcctattctg tacactacat gaaagcagaa 1380 

ggtttaaata atgaagagta tcaagtagtc atcggaaatg acacaactca ttatattatt 1440 

gatgacttag agcctgccag caattatact ttctacattg tagcatatat gccaatggga 1500 

gccagccaga tgtctgacca tgtgacacag aatactctag aggatgttcc cctgagacct 1560 

cctgaaatta gtttgacaag tcgaagtccc actgatattc tcatctcctg gctgccaatc 1620 

ccagccaaat atcggcgggg ccaagtggtg ctgtatcgct tgtctttccg cctaagtact 1680 

gagaattcaa tccaagttct ggagctcccg gggaccacgc atgagtacct tttggaaggc 1740 

ctgaaacctg acagtgtcta cctggttcgg attactgctg ccaccagagt ggggctggga 1800 

gagtcatcag tatggacttc acataggacg cccaaagcta caagcgtgaa agcccctaag I860 

tctccagagt tgcatttgga gcctctgaac tgtaccacca tttctgtgag gtggcagcaa 1920 

gatgtagagg acacagctgc tattcagggc tacaagctgt actacaagga agaagggcag 1980 

caggagaatg ggcccatttt cttggatacc aaggacctac tctatactct cagtggctta 2040 

gaccccagaa gaaaatatca tgtgagactc ctggcttaca acaacataga cgatggctat 2100 

caggcagatc agactgtcag cactccagga tgcgtgtctg ttcgtgatcg catggtccct 2160 

cctccaccac caccccacca tctctatgcg aaggctaaca cctcatcttc catcttcctg 2220 

cactggagga ggcctgcatt caccgctgca caaatcatta actacaccat ccgctgtaat 2280 

cctgttggcc tgcagaatgc ttctttggtt ctgtaccttc aaacatcaga aactcacatg 2340 

ttggttcaag gtctagaacc aaacaccaaa tacgaatttg ccgttcgatt acatgtggat 2400 

cagctttcca gtccttggag ccctgtagtc taccattcta ctcttccaga agcaccagca 2460 

ggcccaccag ttggagtaaa agtgacatta atagaggatg acactgccct ggtttcttgg 2520 

aaaccccctg atggcccaga aacagttgtg acccgctata ctatcttata tgcatctagg 2580 

aaggcctgga ttgcaggaga gtggcaggtc ttacaccgtg aaggggcaat aaccatggct : 2640 

ttgctagaaa acttggtagc aggaaatg-tg tacattgtca agatatctgc atccaatgag 2700 

gtgggagaag gacccttttc aaattctgtg gagctggcag tacttccaaa ggaaacctct 2760 

gaatcaaatc agaggcccaa gcgtttagat tctgctgatg ccaaagttta ttcaggatat 2820 

taccatctgg accaaaaatc aatgactggc attgctgtag gtgttggcat agccttgacc 2880 

tgcatcctca tctgtgttct catcttgata taccgaagta aagccaggaa atcatctgct 2940 

tccaagacgg cacagaatgg aactcaacag ttacctcgta ccagtgcctc cttagctagt 3000 

ggaaatgagg taggaaagaa cctggaagga gctgtaggaa atgaagaatc tttaatgcca 3060 

atgatcatgc caaacagctt cattgatgca aaggtactga gctgcgggat ttgctgcata 3120 



gtatatgtti gtcgggccac tacccctggc acacgcaact ttacagttgc tatggcaact 



960 
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agccgttctt ccattcctcc tccctgtgtg tgtaaaatgt acttccccca aaattgtatg 3180 
ttgaatgtat tataccaata ctcttattaa 3210 



<210> SEQ ID NO 2 

<211> LENGTH: 1069 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<4 00> SEQUENCE: 2 

Met Ala Pro Pro Leu Arg Pro Leu Ala Arg Leu Arg Pro Pro Gly Met 
15 10 15 

Leu Leu Arg Ala Leu Leu Leu Leu Leu Leu Leu Ser Pro Leu Pro Gly 
20 25 30 

Val Trp Cys Phe Ser Glu Leu Ser Phe Val Lys Glu Pro Gin Asp Val 
35 40 45 

Thr Val Thr Arg Lys Asp Pro Val Val Leu Asp Cys Gin Ala His Gly 
50 55 60 

Glu Val Pro He Lys Val Thr Trp Leu Lys Asn Gly Ala Lys Met Ser 
65 70 75 80 

Glu Asn Lys Arg He Glu Val Leu Ser Asn Gly Ser Leu Tyr He Ser 
85 90 95 

Glu Val Glu Gly Arg Arg Gly Glu Gin Ser Asp Glu Gly Phe Tyr Gin 
100 105 110 

Cys Leu Ala Met Asn Lys Tyr Gly Ala He Leu Ser Gin Lys Ala His 
115 120 125 

Leu Ala Leu Ser Thr He Ser Ala Phe Glu Val Gin Pro He Ser Thr 
130 135 140 

Glu Val His Glu Gly Gly Val Ala Arg Phe Ala Cys Lys He Ser Ser 
145 150 155 160 

His Pro Pro Ala Val He Thr Trp Glu Phe Asn Arg Thr Thr Leu Pro 
165 170 175 

Met Thr Met Asp Arg He Thr Ala Leu Pro Thr Gly Val Leu Gin He 
180 185 190 

Tyr Asp Val Ser Gin Arg Asp Ser Gly Asn Tyr Arg Cys He Ala Ala 
195 200 205 

Thr Val Ala His Arg Arg Lye Ser Met Glu Ala Ser Leu Thr Val He 
210 215 220 

Pro Ala Lys Glu Ser Lys Ser Phe His Thr Pro Thr He He Ala Gly 
225 230 235 240 

Pro Gin Asn He Thr Thr Ser Leu His Gin Thr Val Val Leu Glu Cys 
245 250 255 

Met Ala Thr Gly Asn Pro Lys Pro He He Ser Trp Ser Arg Leu Asp 
260 265 270 

His Lys Ser He Asp Val Phe Asn Thr Arg Val Leu Gly Asn Gly Asn 
275 280 285 

Leu Met He Ser Asp Val Arg Leu Gin His Ala Gly Val Tyr Val Cys 
290 295 300 

Arg Ala Thr Thr Pro Gly Thr Arg Asn Phe Thr Val Ala Met Ala Thr 
305 310 315 320 

Leu Thr Val Leu Ala Pro Pro Ser Phe Val Glu Trp Pro Glu Ser Leu 
325 330 335 

Thr Arg Pro Arg Ala Gly Thr Ala Arg Phe Val Cys Gin Ala Glu Gly 
340 345 350 

He Pro Ser Pro Lys Met Ser Trp Leu Lys Asn Gly Arg Lys He His 
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355 360 365 

Ser Asn Gly Arg He Lys Met Tyr Asn Ser Lys Leu Val He Asn Gin 
370 375 380 

He He Pro Glu Asp Asp Ala He Tyr Gin Cys Met Ala Glu Asn Ser 
385 390 395 400 

Gin Gly Ser He Leu Ser Arg Ala Arg Leu Thr Val Val Met Ser Glu 
405 410 415 

Asp Arg Pro Ser Ala Pro Tyr Asn Val His Ala Glu Thr Met Ser Ser 
420 425 430 

Ser Ala He Leu Leu Ala Trp Glu Arg Pro Leu Tyr Asn Ser Asp Lys 
435 440 445 

Val He Ala Tyr Ser Val HiB Tyr Met Lys Ala Glu Gly Leu Asn Asn 
450 455 460 

Glu Glu Tyr Gin Val Val He Gly Asn Asp Thr Thr His Tyr He He 
465 470 475 480 

Asp Asp Leu Glu Pro Ala Ser Asn Tyr Thr Phe Tyr He Val Ala Tyr 
485 490 495 

Met Pro Met Gly Ala Ser Gin Met Ser Asp His Val Thr Gin Asn Thr 
500 505 510 

Leu Glu Asp Val Pro Leu Arg Pro Pro Glu He Ser Leu Thr Ser Arg 
515 520 525 

Ser Pro Thr Asp He Leu He Ser Trp Leu Pro He Pro Ala Lys Tyr 
530 535 540 

Arg. Arg Gly Gin Val Val Leu Tyr Arg Leu Ser Phe Arg Leu Ser Thr 
545 550 555 560 

Glu Asn Ser He Gin Val Leu Glu Leu Pro Gly Thr Thr His Glu Tyr 
565 570 575 

Leu Leu Glu Gly Leu Lys Pro Asp Ser Val Tyr Leu Val Arg He Thr 
580 585 590 

Ala Ala Thr Arg Val Gly Leu Gly Glu Ser Ser Val Trp Thr Ser His 
595 600 605 

Arg Thr Pro Lys Ala Thr Ser Val Lys Ala Pro Lys Ser Pro Glu Leu 
610 615 620 

His Leu Glu Pro Leu Asn Cys Thr Thr He Ser Val Arg Trp Gin Gin 
625 630 635 640 

Asp Val Glu Asp Thr Ala Ala He Gin Gly Tyr Lys Leu Tyr Tyr Lys 
645 650 655 

Glu Glu Gly Gin Gin Glu Asn Gly Pro He Phe Leu Asp Thr Lys Asp 
660 665 670 

Leu Leu Tyr Thr Leu Ser Gly Leii Asp Pro Arg Arg Lys Tyr His Val 
675 680 685 

Arg Leu Leu Ala Tyr Asn Asn He Asp Asp Gly Tyr Gin Ala Asp Gin 
690 695 700 

Thr Val Ser Thr Pro Gly Cys Val Ser Val Arg Asp Arg Met Val Pro 
705 710 715 720 

Pro Pro Pro Pro Pro His His Leu Tyr Ala Lys Ala Asn Thr Ser Ser 
725 730 735 

Ser He Phe Leu His Trp Arg Arg Pro Ala Phe Thr Ala Ala Gin He 
740 745 750 

He Asn Tyr Thr He Arg Cys Asn Pro Val Gly Leu Gin Asn Ala Ser 
755 760 765 

Leu Val Leu Tyr Leu Gin Thr Ser Glu Thr His Met Leu Val Gin Gly 
770 775 780 
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Leu Glu Pro Asn Thr Lys Tyr Glu Phe Ala Val Arg Leu His Val Asp 
785 - 790 795 800 

Gin Leu Ser Ser Pro Trp Ser Pro Val Val Tyr His Ser Thr Leu Pro 
805 810 815 

Glu Ala Pro Ala Gly Pro Pro Val Gly Val Lys Val Thr Leu lie Glu 
820 825 830 

Asp Asp Thr Ala Leu Val Ser Trp Lys Pro Pro Asp Gly Pro Glu Thr 
835 840 845 

Val Val Thr Arg Tyr Thr He Leu Tyr Ala Ser Arg Lys Ala Trp He 
850 855 860 

Ala Gly Glu Trp Gin Val Leu His Arg Glu Gly Ala He Thr Met Ala 
865 870 875 880 

Leu Leu Glu Asn Leu Val Ala Gly Asn Val Tyr He Val Lys He Ser 
885 890 895 

Ala Ser Asn Glu Val Gly Glu Gly Pro Phe Ser Asn Ser Val Glu Leu 
900 905 910 

Ala Val Leu Pro Lys Glu Thr Ser Glu Ser Asn Gin Arg Pro Lys Arg 
915 920 925 

Leu Asp Ser Ala Asp Ala Lys Val Tyr Ser Gly Tyr Tyr His Leu Asp 
930 935 940 

Gin Lys Ser Met Thr Gly He Ala Val Gly Val Gly He Ala Leu Thr 
945 950 955 960 

Cys He Leu He Cys Val Leu He Leu He Tyr Arg Ser Lys Ala Arg 
965 970 975 

Lys Ser Ser Ala Ser Lys Thr Ala Gin Asn Gly Thr Gin Gin Leu Pro 
980 985 990 

Arg Thr Ser Ala Ser Leu Ala Ser Gly Asn Glu Val Gly Lys Asn Leu 
995 1000 1005 

Glu Gly Ala Val Gly Asn Glu Glu Ser Leu Met Pro Met He Met Pro 
1010 1015 1020 

Asn Ser Phe He Asp Ala Lys Val Leu Ser Cys Gly He Cys Cys He 
1025 1030 1035 1040 

Ser Arg Ser Ser He Pro Pro Pro Cys Val Cys Lys Met Tyr Phe Pro 
1045 1050 1055 

Gin Asn Cys Met Leu Asn Val Leu Tyr Gin Tyr Ser Tyr 
1060 1065 



<210> SEQ ID NO 3 

<211> LENGTH: 1143 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 3 — , 

atggcgcctc ctctgcgacc cctcgcccgg ctgcgaccgc cggggatgct gctccgcgcg 60 

ctcctgctcc tgctgratgct cagtcctttg ccaggagtgt ggtgctttag cgaactgtct 120 

tttgtaaaag aaccacagga tgtaactgtc acaagaaagg acccagtcgt tttagattgc 180 

caggctcacg gagaagttcc tattaaggtc acatggttga aaaatggagc aaaaatgtct 240 

gaaaataaac ggatcgaggt tctttctaac ggctctttat acatcagtga ggtggaaggc 300 

aggcgaggag agcagtccga tgaaggattt tatcagtgct tggcaatgaa caaatatgga 360 

gccattctta gtcaaaaagc tcatcttgcc ttatcaacta tttctgcatt tgaagtccag 420 

ccaatttcca ctgaggtcca cgaaggtgga gttgctcgat ttgcatgcaa gatttcatcc 480 
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caccctcctg cagtcataac atgggagttc aatcggacaa ctctacctat gactatggac 540 

aggataactg ccctaccaac aggagtattg cagatctatg atgtcagcca aagggattct . 600 

ggaaattatc gttgtattgc tgccactgta gcccaccgac gtaaaagtat ggaggcctcg 660 

ctaactgtga ttccagctaa ggagtcaaaa tccttccaca caccarcaat tatagcaggt 720 

ccacagaaca taacaacatc tcttcatcag actgtagttt tggaatgcat ggccacagga 780 

aatcccaaac caatcatttc ttggagccgc cttgatcaca aatccattga tgtctttaat 840 

actcgggtac ttggaaatgg taatctcatg atatctgatg tcaggctaca acatgctgga 900 

gtatatgttt gtcgggccac tacccctggc acacgcaact ttacagttgc tatggcaact 960 

ttaactgtat tagctcctcc ttcatttgtt gaatggccag aaagtttaac aaggcctcga 1020 

gctggcactg ctcgatttgt gtgtcaggca gaaggaatcc cctctcccaa gatgtcatgg 1080 

ttgaaaaatg gaaggaagat acattcgaat ggtagaatta aaatgtacaa caggtttaaa 1140 

taa 1143 



<210> SEQ ID NO 4 

<211> LENGTH: 380 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 4 

Net Ala Pro Pro Leu Arg Pro Leu Ala Arg Leu Arg Pro Pro Gly Met 
15 10 15 

Leu Leu Arg Ala Leu Leu Leu Leu Leu Leu Leu Ser Pro Leu Pro Gly 
20 25 30 

Val Trp Cys Phe Ser Glu Leu Ser Phe Val Lys Glu Pro Gin Asp Val 
35 40 45 

Thr Val Thr Arg Lys Asp Pro Val Val Leu Asp Cys Gin Ala His Gly 
50 55 60 

Glu Val Pro lie Lys Val Thr Trp Leu Lys Asn Gly Ala Lys Met Ser 
65 70 75 80 

Glu Asn Lys Arg lie Glu Val Leu Ser Asn Gly Ser Leu Tyr lie Ser 
85 90 95 

Glu Val Glu Gly Arg Arg Gly Glu Gin Ser Asp Glu Gly Phe Tyr Gin 
100 105 110 

Cys Leu Ala Met Asn Lys Tyr Gly Ala lie Leu Ser Gin Lys Ala His 
115 120 125 

Leu Ala Leu Ser Thr lie Ser Ala Phe Glu Val Gin Pro He Ser Thr 
130 135 140 

Glu Val His Glu Gly Gly Val Ala Arg Phe Ala Cys Lys He Ser Ser 
145 150 155 160 

His Pro Pro Ala Val He Thr Trp Glu Phe Asn Arg Thr Thr Leu Pro 
165 170 175 

Met Thr Met Asp Arg He Thr Ala Leu Pro Thr Gly Val Leu Gin He 
180 185 190 

Tyr Asp Val Ser Gin Arg Asp Ser Gly Asn Tyr Arg Cys He Ala Ala 
195 200 205 

Thr Val Ala His Arg Arg Lys Ser Met Glu Ala Ser Leu Thr Val He 
210 215 220 

Pro Ala Lys Glu Ser Lys Ser Phe His Thr Pro Thr He He Ala Gly 
225 230 235 240 

Pro Gin Asn He Thr Thr Ser Leu His Gin Thr Val Val Leu Glu Cys 
245 250 255 
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Met Ala Thr Gly Asn Pro Lys Pro He He Ser Trp Ser Arg Leu Asp 
260 265 270 

His Lys Ser He Asp Val Phe Asn Thr Arg Val Leu Gly Aen Gly Asn 
275 280 285 

Leu Met He Ser Asp Val Arg Leu Gin His Ala Gly Val Tyr Val Cys 
290 295 300 

Arg Ala Thr Thr Pro Gly Thr Arg Asn Phe Thr Val Ala Met Ala Thr 
305 310 315 320 

Leu Thr Val Leu Ala Pro Pro Ser Phe Val Glu Trp Pro Glu Ser Leu 
325 330 335 

Thr Arg Pro Arg Ala Gly Thr Ala Arg Phe Val Cys Gin Ala Glu Gly 
340 345 350 

He Pro Ser Pro Lys Met Ser Trp Leu Lys Asn Gly Arg Lys He His 
355 360 365 

Ser Asn Gly Arg lie Lya Met Tyr Asn Arg Phe Lys 
370 375 380 



<210> SEQ ID NO 5 

<211> LENGTH: 2715 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 5 

atggcgcctc ctctgcgacc cctcgcccgg ctgcgaccgc cggggatgct gctccgcgcg 60 

ctcctgctcc tgctgmtgct cagtcctttg ccaggagtgt ggtgctttag cgaactgtct . 120 

tttgtaaaag aaccacagga tgtaactgtc acaagaaagg acccagtcgt tttagattgc 180 

caggctcacg gagaagttcc tattaaggtc acatggttga aaaatggagc aaaaatgtct 24 0 

gaaaataaac ggatcgaggt tctttctaac ggctctttat acatcagtga ggtggaaggc 300 

aggcgaggag agcagtccga tgaaggattt tatcagtgct tggcaatgaa caaatatgga 360 

gccattctta gtcaaaaagc tcatcttgcc ttatcaacta tttctgcatt tgaagtccag 420 

ccaatttcca ctgaggtcca cgaaggtgga gttgctcgat ttgcatgcaa gatttcatcc 480 

caccctcctg cagtcataac atgggagttc aatcggacaa ctctacctat gactatggac 540 

aggataactg ccctaccaac aggagtattg cagatctatg atgtcagcca aagggattct 600 

ggaaattatc gttgtattgc tgccactgta gcccaccgac gtaaaagtat ggaggcctcg 660 

ctaactgtga ttccagctaa ggagtcaaaa tccttccaca caccarcaat tatagcaggt 720 

ccacagaaca taacaacatc tcttcatcag actgtagttt tggaatgcat ggccacagga . . 780 

aatcccaaac caatcatttc ttggagccgc cttgatcaca aatccattga tgtctttaat 840 

actcgggtac ttggaaatgg taatctcatg atatctgatg tcaggctaca acatgctgga -900 

gtatatgttt gtcgggccac tacccctggc acacgcaact ttacagttgc tatggcaact 960 

ttaactgtat tagctcctcc ttcatttgtt gaatggccag aaagtttaac aaggcctcga 1020 

gctggcactg ctcgatttgt gtgtcaggca gaaggaatcc cctctcccaa gatgtcatgg 1080 

ttgaaaaatg gaaggaagat acattcgaat ggtagaatta aaatgtacaa cagtaaattg 1140 

gtaattaacc agattattcc tgaagatgat gctatttatc agtgcatggc tgagaatagc 1200 

caaggatcta ttttatctag agccagactg actgtagtga tgtcagaaga cagacccagt 1260 

gctccctata atgtacatgc tgaaaccatg tcaagctcag ccattctttt agcctgggag 1320 

aggccacttt ataattcaga caaagtcatt gcctattctg tacactacat gaaagcagaa 1380 
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ggtttaaata atgaagagta tcaagtagtc atcggaaatg acacaactca ttatattatt 1440 

gatgacttag agcctgccag caattatact ttctacattg tagcatatat gccaatggga 1500 

gccagccaga tgtctgacca tgtgacacag aatactctag aggatgaccc cagaagaaaa 1560 

tatcatgtga gactcctggc ttacaacaac atagacgatg gctatcaggc agatcagact 1620 

gtcagcactc caggatgcgt gtctgttcgt gatcgcatgg tccctcctcc accaccaccc 1680 

caccatctct atgcgaaggc taacacctca tcttccatct tcctgcactg gaggaggcct 1740 

gcattcaccg ctgcacaaat cattaactac accatccgct gtaatcctgt tggcctgcag 1800 

aatgcttctt tggttctgta ccttcaaaca tcagaaactc acatgttggt tcaaggtcta 1860 

gaaccaaaca ccaaatacga atttgccgtt cgattacatg tggatcagct ttccagtcct 1920 

tggagccctg tagtctacca ttctactctt ccagaagcac cagcaggccc accagttgga 1980 

gtaaaagtga cattaataga ggatgacact gccctggttt cttggaaacc ccctgatggc 2040 

ccagaaacag ttgtgacccg ctatactatc ttatatgcat ctaggaaggc ctggattgca 2100 

ggagagtggc aggtcttaca ccgtgaaggg gcaataacca tggctttgct agaaaacttg 2160 

gtagcaggaa atgtgtacat tgtcaagata tctgcatcca atgaggtggg agaaggaccc 2220 

ttttcaaatt ctgtggagct ggcagtactt ccaaaggaaa cctctgaatc aaatcagagg 2280 

cccaagcgtt tagattctgc tgatgccaaa gtttattcag gatattacca tctggaccaa 2340 

aaatcaatga ctggcattgc tgtaggtgtt ggcatagcct tgacctgcat cctcatctgt 2400 

gttctcatct tgatataccg aagtaaagcc aggaaatcat ctgcttccaa gacggcacag 2460 

aatggaactc aacagttacc tcgtaccagt gcctccttag ctagtggaaa tgaggtagga 2520 

aagaacctgg aaggagctgt aggaaatgaa gaatctttaa tgccaatgat catgccaaac 2580 

agcttcattg atgcaaaggt actgagctgc gggatttgct gcataagccg ttcttccatt 2640 

cctcctccct gtgtgtgtaa aatgtacttc ccccaaaatt gtatgttgaa tgtattatac 2700 

caatactctt attaa 2715 

<210> SEQ ID NO 6 

<211> LENGTH: 904 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 6 

Met Ala Pro Pro Leu Arg Pro Leu Ala Arg Leu Arg Pro Pro Gly Met 
15 10 15 

Leu Leu Arg Ala Leu Leu Leu Leu Leu Leu Leu Ser Pro Leu Pro Gly 



Thr Val Thr Arg Lye Asp Pro Val Val Leu Asp Cys Gin Ala His Gly 
50 55 60 

Glu Val Pro lie Lys Val Thr Trp Leu Lys Asn Gly Ala Lys Met Ser 
65 70 75 80 

Glu Asn Lys Arg lie Glu Val Leu Ser Asn Gly Ser Leu Tyr He Ser 
85 90 95 

Glu Val Glu Gly Arg Arg Gly Glu Gin Ser Asp Glu Gly Phe Tyr Gin 
100 105 110 

Cys Leu Ala Met Asn Lys Tyr Gly Ala He Leu Ser Gin Lys Ala His 
115 120 125 

Leu Ala Leu Ser Thr He Ser Ala Phe Glu Val Gin Pro He Ser Thr 



US 6,465,632 Bl 
29 30 



-continued 



130 135 140 

Glu Val His Glu Gly Gly Val Ala Arg Phe Ala Cys Lys He Ser Ser 
145 150 155 160 

His Pro Pro Ala Val He Thr Trp Glu Phe Asn Arg Thr Thr Leu Pro 
165 170 175 

Met Thr Met Asp Arg He Thr Ala Leu Pro Thr Gly Val Leu Gin He 
180 185 190 

Tyr Asp Val Ser Gin Arg Asp Ser Gly Asn Tyr Arg Cys He Ala Ala 
195 200 205 

Thr Val Ala His Arg Arg Lys Ser Met Glu Ala Ser Leu Thr Val He 
210 215 220 

Pro Ala Lys Glu Ser Lys Ser Phe His Thr Pro Thr He He Ala Gly 
225 230 235 240 

Pro Gin Asn He Thr Thr Ser Leu His Gin Thr Val Val Leu Glu Cys 
245 250 255 

Met Ala Thr Gly Asn Pro Lye Pro He He Ser Trp Ser Arg Leu Asp 
260 265 270 

His Lys Ser He Asp Val Phe Asn Thr Arg Val Leu Gly Asn Gly Asn 
275 280 285 

Leu Met He Ser Asp Val Arg Leu Gin His Ala Gly Val Tyr Val Cys 
290 295 300 

Arg Ala Thr Thr Pro Gly Thr Arg Asn Phe Thr Val Ala Met Ala Thr 
305 310 315 320 

Leu Thr Val Leu Ala Pro Pro Ser Phe Val Glu Trp Pro Glu Ser Leu 
325 330 335 

Thr Arg Pro Arg Ala Gly Thr Ala Arg Phe Val Cys Gin Ala Glu Gly 
340 345 350 

He Pro Ser Pro Lys Met Ser Trp Leu Lys Asn Gly Arg Lys He His 
355 360 365 

Ser Asn Gly Arg He Lys Met Tyr Asn Ser Lys Leu Val He Asn Gin 
370 375 380 

He He Pro Glu Asp Asp Ala He Tyr Gin Cys Met Ala Glu Asn Ser 
385 390 395 400 

Gin Gly Ser He Leu Ser Arg Ala Arg Leu Thr Val Val Met Ser Glu 
405 410 415 

Asp Arg Pro Ser Ala Pro Tyr Asn Val His Ala Glu Thr Met Ser Ser 
420 425 430 

Ser Ala He Leu Leu Ala Trp Glu Arg Pro Leu Tyr Asn Ser Asp Lys 
435 440 445 

Val He Ala Tyr Ser Val His Tyr Met Lys Ala Glu Gly Leu Asn Asn 
450 455 460 

Glu Glu Tyr Gin Val Val He Gly Asn Asp Thr Thr His Tyr He He 
465 470 475 480 

Asp Asp Leu Glu Pro Ala Ser Asn Tyr Thr Phe Tyr He Val Ala Tyr 
485 490 495 

Met Pro Met Gly Ala Ser Gin Met Ser Asp His Val Thr Gin Asn Thr 
500 505 510 

Leu Glu Asp Asp Pro Arg Arg Lys Tyr His Val Arg Leu Leu Ala Tyr 
515 520 525 

Asn Asn He Asp Asp Gly Tyr Gin Ala Asp Gin Thr Val Ser Thr Pro 
530 535 540 

Gly Cys Val Ser Val Arg Asp Arg Met Val Pro Pro Pro Pro Pro Pro 
545 550 555 560 
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His His Leu Tyr Ala Lys Ala Aen Thr Ser Ser Ser lie Phe Leu His 
565 570 575 

Trp Arg Arg Pro Ala Phe Thr Ala Ala Gin He He Asn Tyr Thr He. 

580 585 590 - ■ 

Arg Cys Asn Pro Val Gly Leu Gin Asn Ala Ser Leu Val Leu Tyr Leu 
595 600 605 

Gin Thr Ser Glu Thr His Met Leu Val Gin Gly Leu Glu Pro Asn Thr 
610 615 620 

Lys Tyr Glu Phe Ala Val Arg Leu His Val Asp Gin Leu Ser Ser Pro 
625 630 635 640 

Trp Ser Pro Val Val Tyr His Ser Thr Leu Pro Glu Ala Pro Ala Gly 
645 650 655 

Pro Pro Val Gly Val Lys Val Thr Leu He Glu Asp Asp Thr Ala Leu 
660 665 670 

Val Ser Trp Lys Pro Pro Asp Gly Pro Glu Thr Val Val Thr Arg Tyr 
675 680 685 

Thr He Leu Tyr Ala Ser Arg Lys Ala Trp He Ala Gly Glu Trp Gin 
690 695 700 

Val Leu Hia Arg Glu Gly Ala He Thr Met Ala Leu Leu Glu Asn Leu 
7 <>5 710 715 720 

Val Ala Gly Asn Val Tyr He Val Lys He Ser Ala Ser Asn Glu Val 
725 730 735 

Gly Glu Gly Pro Phe Ser Asn Ser Val Glu Leu Ala Val Leu Pro Lys 
740 745 750 

Glu Thr Ser Glu Ser Asn Gin Arg Pro Lys Arg Leu Asp Ser Ala Asp 
755 760 765 

Ala Lys Val Tyr Ser Gly Tyr Tyr His Leu Aap Gin Lys Ser Met Thr 
770 775 780 

Gly He Ala Val Gly Val Gly He Ala Leu Thr Cys He Leu He Cys 
785 790 795 800 

Val Leu He Leu He Tyr Arg Ser Lye Ala Arg Lys Ser Ser Ala Ser 
805 810 815 

Lye Thr Ala Gin Asn Gly Thr Gin Gin Leu Pro Arg Thr Ser Ala Ser 
820 825 830 

Leu Ala Ser Gly Asn Glu Val Gly Lys Asn Leu Glu Gly Ala Val Gly 
835 840 845 

Asn Glu Glu Ser Leu Met Pro Met He Met Pro Asn Ser Phe He Asp 
850 855 860 

Ala Lye Val Leu Ser Cys Gly He Cys Cys He Ser Arg Ser Ser He 
.865 870 875 " 880 

Pro Pro Pro Cys Val Cys Lys Met Tyr Phe Pro Gin Asn Cys Met Leu 
885 890 895 

Asn Val Leu Tyr Gin Tyr Ser Tyr 
900 



<210> SEQ ID NO 7 

<211> LENGTH: 3453 

<212> TYPE : DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 7 

atggcgcctc ctctgcgacc cctcgcccgg ctgcgaccgc cggggatgct gctccgcgcg 60 



ctcctgctcc tgctgmtgct cagtcctttg ccaggagtgt ggtgctttag cgaactgtct 



120 
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tttgtaaaag aaccacagga tgtaactgtc acaagaaagg acccagtcgt tttagattgc 



180 



caggctcacg gagaagttcc tattaaggtc acatggttga aaaatggagc aaaaatgtct 240 

gaaaataaac ggatcgaggt tctttctaac ggctctttat acatcagtga ggtggaaggc .300 

aggcgaggag agcagtccga tgaaggattt tatcagtgct tggcaatgaa caaatatgga 360 

gccattctta gtcaaaaagc tcatcttgcc ttatcaacta tttctgcatt tgaagtccag 420 

ccaatttcca ctgaggtcca cgaaggtgga gttgctcgat ttgcatgcaa gatttcatcc 480 

caccctcctg cagtcataac atgggagttc aatcggacaa ctctacctat gactatggac 540 

aggataactg ccctaccaac aggagtattg cagatctatg atgtcagcca aagggattct 600 

ggaaattatc gttgtattgc tgccactgta gcccaccgac gtaaaagtat ggaggcctcg 660 

ctaactgtga ttccagctaa ggagtcaaaa tccttccaca caccaacaat tatagcaggt 720 

ccacagaaca taacaacatc tcttcatcag actgtagttt tggaatgcat ggccacagga 780 

aatcccaaac caatcatttc ttggagccgc cttgatcaca aatccattga tgtctttaat 840 

actcgggtac ttggaaatgg taatctcatg atatctgatg tcaggctaca acatgctgga 900 

gtatatgttt gtcgggccac tacccctggc acacgcaact ttacagttgc tatggcaact 960 

ttaactgtat tagctcctcc ttcatttgtt gaatggccag aaagtttaac aaggcctcga 1020 

gctggcactg ctcgatttgt gtgtcaggca gaaggaatcc cctctcccaa gatgtcatgg 1080 

ttgaaaaatg gaaggaagat acattcgaat ggtagaatta aaatgtacaa cagtaaattg 1140 

gtaattaacc agattattcc tgaagatgat gctatttatc agtgcatggc tgagaatagc 1200 

caaggatcta ttttatctag agccagactg actgtagtga tgtcagaaga cagacccagt 1260 

gctccctata atgtacatgc tgaaaccatg tcaagctcag ccattctttt agcctgggag 1320 

aggccacttt ataattcaga caaagtcatt gcctattctg tacactacat gaaagcagaa 1380 

ggtttaaata atgaagagta tcaagtagtc atcggaaatg acacaactca ttatattatt 1440 

gatgacttag agcctgccag caattatact ttctacattg tagcatatat gccaatggga 150 0 

gccagccaga tgtctgacca tgtgacacag aatactctag aggatgttcc cctgagacct 1560 

cctgaaatta gtttgacaag tcgaagtccc actgatattc tcatctcctg gctgccaatc 1620 

ccagccaaat atcggcgggg ccaagtggtg ctgtatcgct tgtctttccg cctaagtact 1680 

gagaattcaa tccaagttct ggagctcccg gggaccacgc atgagtacct tttggaaggc 1740 

ctgaaacctg acagtgtcta cctggttcgg attactgctg ccaccagagt ggggctggga 1800 

gagtcatcag tatggacttc acataggacg cccaaagcta caagcgtgaa agcccctaag 1860 

tctccagagt tgcatttgga gcctctgaac tgtaccacca tttctgtgag gtggcagcaa 1920 

gatgtagagg acacagctgc tattcagggc tacaagctgt actacaagga agaagggcag 1980 

caggagaatg ggcccatttt cttggatacc aaggacctac tctatactct cagtggctta 2040 

gaccccagaa gaaaatatca tgtgagactc ctggcttaca acaacataga cgatggctat 2100 

caggcagatc agactgtcag cactccagga tgcgtgtctg ttcgtgatcg catggtccct 2160 

cctccaccac caccccacca tctctatgcg aaggctaaca cctcatcttc catcttcctg 2220 

cactggagga ggcctgcatt caccgctgca caaatcatta actacaccat ccgctgtaat 2280 

cctgttggcc tgcagaatgc ttctttggtt ctgtaccttc aaacatcaga aactcacatg 2340 

ttggttcaag gtctagaacc aaacaccaaa tacgaatttg ccgttcgatt acatgtggat 2400 

cagctttcca gtccttggag ccctgtagtc taccattcta ctcttccaga agcaccagca 2460 

ggcccaccag ttggagtaaa agtgacatta atagaggatg acactgccct ggtttcttgg 2520 
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aaaccccctg atggcccaga aacagttgtg acccgctata ctatcttata tgcatctagg 2580 



gtgggagaag gacccttttc aaattctgtg gagctggcag tacttccaaa ggaaacctct 2760 

gaatcaaatc agaggcccaa gcgtttagat tctgctgatg ccaaagttta ttcaggatat 2820 

taccatctgg accaaaaatc aatgactggc attgctgtag gtgttggcat agccttgacc 2880 

tgcatcctca tctgtgttct catcttgata taccgaagta aagccaggaa atcatctgct 2940 

tccaagacgg cacagaatgg aactcaacag ttacctcgta ccagtgcctc cttagctagt 3000 

ggaaatgagg taggaaagaa cctggaagga gctgtaggaa atgaagaatc tttaatgcca 3060 

atgatcatgc caaacagctt cattgatgca aagggaggaa ctgacctgat aattaatagc 3120 

tatggtccta taattaaaaa caactctaag aaaaagtggt tttttttcca agactcaaag 3180 

aagatacaag ttgagcagcc tcaaagaaga tttactccag cggtctgctt ttaccagcca 3240 

ggcaccactg tattaatcag tgatgaagac tcccctagct ccccaggtca gacaaccagc 3300 

ttctcaagac cctttggtgt tgcagctgat acagaacatt cagcaaatag tgaaggcagc 3360 

catgagactg gggattctgg gcggttttct catgagtcca acgatgagat acatctgtcc 3420 

tcagttataa gtaccacacc ccccaacctc tga 3453 



<210> SEQ ID NO 8, 

<211> LENGTH: 1150 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 8 

Met Ala Pro Pro Leu Arg Pro Leu Ala Arg Leu Arg Pro Pro Gly Met 
1 5 io 15 

Leu Leu Arg Ala Leu Leu Leu Leu Leu Leu Leu Ser Pro Leu Pro Gly 
20 25 30 

Val Trp Cye Phe Ser Glu Leu Ser Phe Val Lys Glu Pro Gin Asp Val 
35 40 45 

Thr Val Thr Arg Lys Asp Pro Val Val Leu Asp Cys Gin Ala His Gly 
50 55 60 

Glu Val Pro lie Lys Val Thr Trp Leu Lys Asn Gly Ala Lys Met Ser 
65 70 75 80 

Glu Asn Lys Arg He Glu Val Leu Ser Asn Gly Ser Leu Tyr He Ser 
85 90 95 

Glu Val Glu Gly Arg Arg Gly Glu Gin Ser Asp Glu Gly Phe Tyr Gin 
100 105 110 

Cys Leu Ala Met Asn Lys Tyr Gly Ala He Leu Ser Gin Lys Ala His 
115 120 125 

Leu Ala Leu Ser Thr He Ser Ala Phe Glu Val Gin Pro He Ser Thr 
130 135 140 

Glu Val His Glu Gly Gly Val Ala Arg Phe Ala Cys Lys He Ser Ser 
145 150 155 160 

His Pro Pro Ala Val He Thr Trp Glu Phe Asn Arg Thr Thr Leu Pro 
165 170 175 

Met Thr Met Asp Arg He Thr Ala Leu Pro Thr Gly Val Leu Gin He 
180 185 190 

Tyr Asp Val Ser Gin Arg Asp Ser Gly Asn Tyr Arg Cys He Ala Ala 



aaggcctgga ttgcaggaga gtggcaggtc ttacaccgtg aaggggcaat aaccatggct 
ttgctagaaa acttggtagc aggaaatgtg tacattgtca agatatctgc atccaatgag 



2700 



2640 



195 



200 



205 
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Thr Val Ala Hie Arg Arg Lye Ser Met Glu Ala Ser Leu Thr Val lie 
210 215 220 

Pro Ala Lys Glu Ser Lys Ser . Phe His Thr Pro Thr He He Ala Gly 
225 230 235 240 

Pro Gin Asn He Thr Thr Ser Leu His Gin Thr Val Val Leu Glu Cys 
245 250 255 

Met Ala Thr Gly Asn Pro Lys Pro He lie Ser Trp Ser Arg Leu Asp 
260 265 270 

His Lys Ser He Asp Val Phe Asn Thr Arg Val Leu Gly Asn Gly Asn 
275 280 285 

Leu Met He Ser Asp Val Arg Leu Gin His Ala Gly Val Tyr Val Cys 
290 295 300 

Arg Ala Thr Thr Pro Gly Thr Arg Asn Phe Thr Val Ala Met Ala Thr 
305 310 315 320 

Leu Thr Val Leu Ala Pro Pro Ser Phe Val Glu Trp Pro Glu Ser Leu 
325 330 335 

Thr Arg Pro Arg Ala Gly Thr Ala Arg Phe Val Cye Gin Ala Glu Gly 
340 345 350 

He Pro Ser Pro Lys Met Ser Trp Leu Lys Aen Gly Arg Lys He His 
355 360 365 

Ser Asn Gly Arg He Lys Met Tyr Asn Ser Lys Leu Val He Asn Gin 
370 375 380 

He He Pro Glu Asp Asp Ala He Tyr Gin Cys Met Ala Glu Asn Ser 
385 390 395 400 

Gin Gly Ser He Leu Ser Arg Ala Arg Leu Thr Val Val Met Ser Glu 
405 410 415 

Asp Arg Pro Ser Ala Pro Tyr Asn Val His Ala Glu Thr Met Ser Ser 
«0 425 430 

Ser Ala He Leu Leu Ala Trp Glu Arg Pro Leu Tyr Asn Ser Asp Lys 
435 440 445 

Val He Ala Tyr Ser Val His Tyr Met Lys Ala Glu Gly Leu Asn Asn 
450 455 460 

Glu Glu Tyr Gin Val Val He Gly Asn Asp Thr Thr His Tyr He He 
465 470 475 480 

Asp Asp Leu Glu Pro Ala Ser Asn Tyr Thr Phe Tyr He Val Ala Tyr 
485 490 495 

Met Pro Met Gly Ala Ser Gin Met Ser Asp His Val Thr Gin Asn Thr 
500 505 510 

Leu Glu Asp Val Pro Leu Arg Pro Pro Glu He Ser Leu Thr Ser Arg 
515 520 525 

Ser Pro Thr Asp He Leu He Ser Trp Leu Pro He Pro Ala Lys Tyr 
530 535 540 

Arg Arg Gly Gin Val Val Leu Tyr Arg Leu Ser Phe Arg Leu Ser Thr 
545 550 555 560 

Glu Asn Ser lie Gin Val Leu Glu Leu Pro Gly Thr Thr His Glu Tyr 
565 570 575 

Leu Leu Glu Gly Leu Lys Pro Asp Ser Val Tyr Leu Val Arg He Thr 
580 585 590 

Ala Ala Thr Arg Val Gly Leu Gly Glu Ser Ser Val Trp Thr Ser His 
595 600 605 

Arg Thr Pro Lys Ala Thr Ser Val Lys Ala Pro Lys Ser Pro Glu Leu 
6X0 615 620 
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His Leu Glu Pro Leu Asn Cys Thr Thr lie Ser Val Arg Trp Gin Gin 
625 630 635 640 

Asp Val Glu Asp Thr Ala Ala lie Gin Gly Tyr Lye Leu Tyr Tyr Lys 
645 650 . 655 

Glu Glu Gly Gin Gin Glu Asn Gly Pro He Phe Leu Asp Thr Lye Asp 
660 665 670 

Leu Leu Tyr Thr Leu Ser Gly Leu Asp Pro Arg Arg Lys Tyr His Val 
675 680 685 

Arg Leu Leu Ala Tyr Asn Asn He Asp Asp Gly Tyr Gin Ala Asp Gin 
690 695 700 

Thr Val Ser Thr Pro Gly Cys Val Ser Val Arg Asp Arg Met Val Pro 
705 710 715 720 

Pro Pro Pro Pro Pro His His Leu Tyr Ala Lys Ala Asn Thr Ser Ser 
725 730 735 

Ser He Phe Leu His Trp Arg Arg Pro Ala Phe Thr Ala Ala Gin He 
740 745 75£J 

He Asn Tyr Thr He Arg Cys Asn Pro Val Gly Leu Gin Asn Ala Ser 
755 760 765 

Leu Val Leu Tyr Leu Gin Thr Ser Glu Thr His Met Leu Val Gin Gly 
770 775 780 

Leu Glu Pro Asn Thr Lys Tyr Glu Phe Ala Val Arg Leu His Val Asp 
785 790 795 800 

Gin Leu Ser Ser Pro Trp Ser Pro Val Val Tyr His Ser Thr Leu Pro 
805 810 815 

Glu Ala Pro Ala Gly Pro Pro Val Gly Val Lys Val Thr Leu He Glu 
820 825 830 

Asp Asp Thr Ala Leu Val Ser Trp Lys Pro Pro Asp Gly Pro Glu Thr 
835 840 * 845 

Val Val Thr Arg Tyr Thr He Leu Tyr Ala Ser Arg Lys Ala Trp He 
850 855 860 

Ala Gly Glu Trp Gin Val Leu His Arg Glu Gly Ala He Thr Met Ala 
865 870 875 880 

Leu Leu Glu Asn Leu Val Ala Gly Asn Val Tyr He Val Lys He Ser 
885 890 895 

Ala Ser Asn Glu Val Gly Glu Gly Pro Phe Ser Asn Ser Val Glu Leu 
900 905 910 

Ala Val Leu Pro Lys Glu Thr Ser Glu Ser Asn Gin Arg Pro Lys Arg 
915 920 925 

Leu Asp Ser Ala Asp Ala Lys Val Tyr Ser Gly Tyr Tyr His Leu Asp 
930 935 940 

Gin Lys Ser Met Thr Gly He Ala Val Gly Val Gly He Ala Leu Thr 
945 950 955 960 

Cys He Leu He Cys Val Leu He Leu He Tyr Arg Ser Lys Ala Arg 
965 970 975 

Lys Ser Ser Ala Ser Lys Thr Ala Gin Asn Gly Thr Gin Gin Leu Pro 
980 985 990 

Arg Thr Ser Ala Ser Leu Ala Ser Gly Asn Glu Val Gly Lys Asn Leu 
995 1000 1005 

Glu Gly Ala Val Gly Asn Glu Glu Ser Leu Met Pro Met He Met Pro 
1Q 10 1015 1020 

Asn Ser Phe He Asp Ala Lys Gly Gly Thr Asp Leu He He Asn Ser 
1025 1030 1035 1040 

Tyr Gly Pro He He Lys Asn Asn Ser Lys Lys Lys Trp Phe Phe Phe 
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1045 



1050 



1055 



Gin Asp Ser Lys Lye He Gin Val Glu Glh Pro Gin Arg Arg Phe Thr 
.1060 • 1065 1070 



Pro Ala Val Cys Phe Tyr Gin Pro Gly Thr Thr Val Leu He Ser Asp 
1075 1080 1085 



Glu Aap Ser Pro Ser Ser Pro Gly Gin Thr Thr Ser Phe Ser Arg Pro 
1090 1095 1100 



Phe Gly Val Ala Ala Asp Thr Glu Hie Ser Ala Asn Ser Glu Gly Ser 
1105 1110 1115 1120 



His Glu Thr Gly Asp Ser Gly Arg Phe Ser His Glu Ser Asn Asp Glu 
1125 1130 1135 



He His Leu Ser Ser Val He Ser Thr Thr Pro Pro Asn Leu 
1140 1145 1150 



<210> SEQ ID NO 9 

<211> LENGTH: 2958 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 



<4 00> SEQUENCE: 9 



atggcgcctc ctctgcgacc cctcgcccgg ctgcgaccgc cggggatgct gctccgcgcg 



60 



ctcctgctcc tgctgmtgct cagtcctttg ccaggagtgt ggtgctttag cgaactgtct 120 
tttgtaaaag aaccacagga tgtaactgtc acaagaaagg acccagtcgt tttagattgc 180 
caggctcacg gagaagttcc tattaaggtc acatggttga aaaatggagc aaaaatgtct 240 



aggcgaggag agcagtccga tgaaggattt tatcagtgct tggcaatgaa caaatatgga 360 

gccattctta gtcaaaaagc tcatcttgcc ttatcaacta tttctgcatt tgaagtccag 420 

ccaatttcca ctgaggtcca cgaaggtgga gttgctcgat ttgcatgcaa gatttcatcc 480 

caccctcctg cagtcataac atgggagttc aatcggacaa ctctacctat gactatggac 540 

aggataactg ccctaccaac aggagtattg cagatctatg atgtcagcca aagggattct 600 

ggaaattatc gttgtattgc tgccactgta gcccaccgac gtaaaagtat ggaggcctcg 660 

ctaactgtga ttccagctaa ggagtcaaaa tccttccaca caccarcaat tatagcaggt 720 

ccacagaaca taacaacatc tcttcatcag actgtagttt tggaatgcat ggccacagga 780 

aatcccaaac caatcatttc ttggagccgc cttgatcaca aatccattga tgtctttaat 840 

actcgggtac ttggaaatgg taatctcatg atatctgatg tcaggctaca acatgctgga 900 

gtatatgttt gtcgggccac tacccctggc acacgcaact ttacagttgc tatggcaact 960 

ttaactgtat tagctcctcc ttcatttgtt gaatggccag aaagtttaac aaggcctcga 1020 

gctggcactg ctcgatttgt gtgtcaggca gaaggaatcc cctctcccaa gatgtcatgg 1080 

ttgaaaaatg gaaggaagat acattcgaat ggtagaatta aaatgtacaa cagtaaattg 1140 

gtaattaacc agattattcc tgaagatgat gctatttatc agtgcatggc tgagaatagc 1200 

caaggatcta ttttatctag agccagactg actgtagtga tgtcagaaga cagacccagt 1260 

gctccctata atgtacatgc tgaaaccatg tcaagctcag ccattctttt agcctgggag 1320 

aggccacttt ataattcaga caaagtcatt gcctattctg tacactacat gaaagcagaa 1380 

ggtttaaata atgaagagta tcaagtagtc atcggaaatg acacaactca ttatattatt 1440 

gatgacttag agcctgccag caattatact ttctacattg tagcatatat gccaatggga 1500 

gccagccaga tgtctgacca tgtgacacag aatactctag aggatgaccc cagaagaaaa 1560 



gaaaataaac ggatcgaggt tctttctaac ggctctttat acatcagtga ggtggaaggc 



300 
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tatcatgtga gactcctggc ttacaacaac atagacgatg gctatcaggc agatcagact 1620 

gtcagcactc caggatgcgt gtctgttcgt gatcgcatgg tccctcctcc accaccaccc 1680 

caccatctct atgcgaaggc taacacctca tcttccatct tcctgcactg gaggaggcct 174 0 

gcattcaccg ctgcacaaat cattaactac accatccgct gtaatcctgt tggcctgcag 1800 

aatgcttctt tggttctgta ccttcaaaca tcagaaactc acatgttggt tcaaggtcta 1860 

gaaccaaaca ccaaatacga atttgccgtt cgattacatg tggatcagct ttccagtcct 1920 

tggagccctg tagtctacca ttctactctt ccagaagcac cagcaggccc accagttgga 1980 

gtaaaagtga cattaataga ggatgacact gccctggttt cttggaaacc ccctgatggc 2040 

ccagaaacag ttgtgacccg ctatactatc ttatatgcat ctaggaaggc ctggattgca 2100 

ggagagtggc aggtcttaca ccgtgaaggg gcaataacca tggctttgct agaaaacttg 2160 

gtagcaggaa atgtgtacat tgtcaagata tctgcatcca atgaggtggg agaaggaccc 2220 

ttttcaaatt ctgtggagct ggcagtactt ccaaaggaaa cctctgaatc aaatcagagg 2280 

cccaagcgtt tagattctgc tgatgccaaa gtttattcag gatattacca tctggaccaa 2340 

aaatcaatga ctggcattgc tgtaggtgtt ggcatagcct tgacctgcat cctcatctgt 2400 

gttctcatct tgatataccg aagtaaagcc aggaaatcat ctgcttccaa gacggcacag 2460 

aatggaactc aacagttacc tcgtaccagt gcctccttag ctagtggaaa tgaggtagga 2520 

aagaacctgg aaggagctgt aggaaatgaa gaatctttaa tgccaatgat catgccaaac 2580 

agcttcattg atgcaaaggg aggaactgac ctgataatta atagctatgg tcctataatt 2640 

aaaaacaact ctaagaaaaa gtggtttttt ttccaagact caaagaagat acaagttgag 2700 

cagcctcaaa gaagatttac tccagcggtc tgcttttacc agccaggcac cactgtatta 2760 

atcagtgatg aagactcccc tagctcccca ggtcagacaa ccagcttctc aagacccttt 2820 

ggtgttgcag ctgatacaga acattcagca aatagtgaag gcagccatga gactggggat 2880 

tctgggcggt tttctcatga gtccaacgat gagatacatc tgtcctcagt tataagtacc 2940 

acacccccca acctctga 2 958 



<210> SEQ ID NO 10 

<211> LENGTH: 985 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 10 

Met Ala Pro Pro Leu Arg Pro Leu Ala Arg Leu Arg Pro Pro Gly Met 
1 5 10 15 

Leu Leu Arg Ala Leu Leu Leu Leu Leu Leu Leu Ser Pro Leu Pro Gly 
20 25 30 

Val Trp Cye Phe Ser Glu Leu Ser Phe Val Lys Glu Pro Gin Asp Val 
35 40 45 

Thr Val Thr Arg Lys Asp Pro Val Val Leu Asp Cys Gin Ala H±b Gly 
50 55 60 

Glu Val Pro lie Lye Val Thr Trp Leu Lys Asn Gly Ala Lys Met Ser 
65 70 75 80 

Glu Asn Lys Arg He Glu Val Leu Ser Asn Gly Ser Leu Tyr He Ser 
85 90 95 

Glu Val Glu Gly Arg Arg Gly Glu Gin Ser Asp Glu Gly Phe Tyr Gin 
100 105 HO 

Cys Leu Ala Met Asn Lys Tyr Gly Ala He Leu Ser Gin Lys Ala His 
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115 120 125 

Leu Ala Leu Ser Thr He Ser Ala Phe Glu Val Gin Pro He Ser Thr 
130 135 140. 

. Glu Val His Glu Gly Gly Val Ala Arg Phe Ala Cys Lye lie Ser Ser 
145 150 155 160 

His Pro Pro Ala Val He Thr Trp Glu Phe Asn Arg Thr Thr Leu Pro 
165 170 175 

Met Thr Met Asp Arg lie Thr Ala Leu Pro Thr Gly Val Leu Gin He 
180 185 190 

Tyr Asp Val Ser Gin Arg Asp Ser Gly Asn Tyr Arg Cys He Ala Ala 
195 200 205 

Thr Val Ala His Arg Arg Lys Ser Met Glu Ala Ser Leu Thr Val He 
210 215 220 

Pro Ala Lys Glu Ser Lys Ser Phe His Thr Pro Thr He He Ala Gly 
225 230 235 240 

Pro Gin Asn lie Thr Thr Ser Leu Hie Gin Thr Val Val Leu Glu Cys 
245 250 255 

Met Ala Thr Gly Asn Pro Lye Pro He He Ser Trp Ser Arg Leu Asp 
260 265 270 

His Lys Ser He Asp Val Phe Asn Thr Arg Val Leu Gly Asn Gly Asn 
275 280 285 

Leu Met He Ser Asp Val Arg Leu Gin His Ala Gly Val Tyr Val Cys 
290 295 300 

Arg Ala Thr Thr Pro Gly Thr Arg Asn Phe Thr Val Ala Met Ala Thr 
305 310 315 320 ■ 

Leu Thr Val Leu Ala Pro Pro Ser Phe Val Glu Trp Pro Glu Ser Leu 
325 330 335 

Thr Arg Pro Arg Ala Gly Thr Ala Arg Phe Val Cys Gin Ala Glu Gly 
340 345 350 

lie Pro Ser Pro Lys Met Ser Trp Leu Lys Asn Gly Arg Lys He His 
355 360 365 

Ser Asn Gly Arg He Lys Met Tyr Asn Ser Lys Leu Val lie Asn Gin 
370 375 380 

He He Pro Glu Asp Asp Ala He Tyr Gin Cys Met Ala Glu Asn Ser 
385 390 395 400 

Gin Gly Ser He Leu Ser Arg Ala Arg Leu Thr Val Val Met Ser Glu 
405 410 415 

Asp Arg Pro Ser Ala Pro Tyr Asn Val His Ala Glu Thr Met Ser Ser 
420 425 430 

Ser Ala He Leu Leu Ala Trp Glu Arg Pro Leu Tyr Asn Ser Asp Lys 
435 440 . 445 

Val He Ala Tyr Ser Val His Tyr Met Lye Ala Glu Gly Leu Asn Asn 
450 455 460 

Glu Glu Tyr Gin Val Val He Gly Asn Asp Thr Thr His Tyr lie lie 
465 470 475 480 

Asp Asp Leu Glu Pro Ala Ser Asn Tyr Thr Phe Tyr He Val Ala Tyr 
485 490 495 

Met Pro Met Gly Ala Ser Gin Met Ser Asp His Val Thr Gin Asn Thr 
500 505 510 

Leu Glu Asp Asp Pro Arg Arg LyB Tyr His Val Arg Leu Leu Ala Tyr 
515 520 525 

Asn Asn lie Asp Asp Gly Tyr Gin Ala Asp Gin Thr Val Ser Thr Pro 
530 535 540 
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Gly Cys Val Ser Val Arg Asp Arg Met Val Pro Pro Pro Pro Pro Pro 
545 550 555 560 

Hie His Leu Tyr Ala LyB Ala Asn Thr Ser Ser Ser lie Phe Leu His 
565 570 575 

Trp Arg Arg Pro Ala Phe Thr Ala Ala Gin lie lie Asn Tyr Thr He 
580 585 590 

Arg Cys Asn Pro Val Gly Leu Gin Asn Ala Ser Leu Val Leu Tyr Leu 
595 600 605 

Gin Thr Ser Glu Thr His Met Leu Val Gin Gly Leu Glu Pro Asn Thr 
610 615 620 

Lys Tyr Glu Phe Ala Val Arg Leu His Val Asp Gin Leu Ser Ser Pro 
625 630 635 640 

Trp Ser Pro Val Val Tyr His Ser Thr Leu Pro Glu Ala Pro Ala Gly 
645 650 655 

Pro Pro Val Gly Val Lys Val Thr Leu He Glu Asp Asp Thr Ala Leu 
660 665 670 

Val Ser Trp Lys Pro Pro Asp Gly Pro Glu Thr Val Val Thr Arg Tyr 
675 680 685 

Thr He Leu Tyr Ala Ser Arg Lys Ala Trp He Ala Gly Glu Trp Gin 
690 695 700 

Val Leu His Arg Glu Gly Ala He Thr Met Ala Leu Leu Glu Asn Leu 
705 710 715 720 

Val Ala Gly Asn Val Tyr He Val Lys He Ser Ala Ser Asn Glu Val 
725 730 735 

Gly Glu Gly Pro Phe Ser Asn Ser Val Glu Leu Ala Val Leu Pro Lys 
740 745 750 

Glu Thr Ser Glu Ser Asn Gin Arg Pro Lys Arg Leu Asp Ser Ala Asp 
755 760 765 

Ala Lys Val Tyr Ser Gly Tyr Tyr His Leu Asp Gin Lys Ser Met Thr 
770 775 780 

Gly He Ala Val Gly Val Gly He Ala Leu Thr Cys He Leu He Cys 
785 790 795 800 

Val Leu He Leu He Tyr Arg Ser Lys Ala Arg Lys Ser Ser Ala Ser 
805 810 815 

Lys Thr Ala Gin Asn Gly Thr Gin Gin Leu Pro Arg Thr Ser Ala Ser 
820 825 830 

Leu Ala Ser Gly Asn Glu Val Gly Lys Asn Leu Glu Gly Ala Val Gly 
835 840 845 

Asn Glu Glu Ser Leu Met Pro Met He Met Pro Asn Ser Phe He Asp 
850 855 860 

Ala Lys Gly Gly Thr Asp Leu He lie Asn Ser Tyr Gly Pro He He 
865 870 875 880 

Lys Asn Asn Ser Lys Lys Lys Trp Phe Phe Phe Gin Asp Ser Lys Lys 
885 890 895 

He Gin Val Glu Gin Pro Gin Arg Arg Phe Thr Pro Ala Val Cys Phe 
900 90S 9 10 

Tyr Gin Pro Gly Thr Thr Val Leu He Ser Asp Glu Asp Ser Pro Ser 
915 920 925 

Ser Pro Gly Gin Thr Thr Ser Phe Ser Arg Pro Phe Gly Val Ala Ala 
^30 935 940 

Asp Thr Glu His Ser Ala Asn Ser Glu Gly Ser His Glu Thr Gly Asp 
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Ser Gly Arg Phe Ser His Glu Ser Asn Asp Glu He His Leu Ser Ser 
965 970 975 



■ Val He Ser. Thr Thr Pro Pro Asn Leu 
980 985 



<210> SEQ ID NO 11 

<211> LENGTH: 2976 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 11 

atgtctgaaa ataaacggat cgaggttctt tctaacggct ctttatacat cagtgaggtg 60 

gaaggcaggc gaggagagca gtccgatgaa ggattttatc agtgcttggc aatgaacaaa 120 

tatggagcca ttcttagtca aaaagctcat cttgccttat caactatttc tgcatttgaa 180 

gtccagccaa tttccactga ggtccacgaa ggtggagttg ctcgatttgc atgcaagatt 240 

tcatcccacc ctcctgcagt cataacatgg gagttcaatc ggacaactct acctatgact 300 

atggacagga taactgccct accaacagga gtattgcaga tctatgatgt cagccaaagg 360 

gattctggaa attatcgttg tattgctgcc actgtagccc accgacgtaa aagtatggag 420 

gcctcgctaa ctgtgattcc agctaaggag tcaaaatcct tccacacacc arcaattata 480 

gcaggtccac agaacataac aacatctctt catcagactg tagttttgga atgcatggcc 540 

acaggaaatc ccaaaccaat catttcttgg agccgccttg atcacaaatc cattgatgtc 600 

tttaatactc gggtacttgg aaatggtaat ctcatgatat ctgatgtcag gctacaacat 660 

gctggagtat atgtttgtcg ggccactacc cctggcacac gcaactttac agttgctatg 720 

gcaactttaa ctgtattagc tcctccttca tttgttgaat ggccagaaag tttaacaagg 780 

cctcgagctg gcactgctcg atttgtgtgt caggcagaag gaatcccctc tcccaagatg 840 

tcatggttga aaaatggaag gaagatacat tcgaatggta gaattaaaat gtacaacagt 900 

aaattggtaa ttaaccagat tattcctgaa gatgatgcta tttatcagtg catggctgag 960 

aatagccaag gatctatttt atctagagcc agactgactg tagtgatgtc agaagacaga 1020 

cccagtgctc cctataatgt acatgctgaa accatgtcaa gctcagccat tcttttagcc 1080 

tgggagaggc cactttataa ttcagacaaa gtcattgcct attctgtaca ctacatgaaa 1140 

gcagaaggtt taaataatga agagtatcaa gtagtcatcg gaaatgacac aactcattat 1200 

attattgatg acttagagcc tgccagcaat tatactttct acattgtagc atatatgcca 1260 

atgggagcca gccagatgtc tgaccatgtg acacagaata ctctagagga tgttcccctg 1320 

agacctcctg aaattagttt gacaagtcga agtcccactg atattctcat ctcctggctg 1380 

■■ ccaatcccag ccaaatatcg gcggggccaa gtggtgctgt atcgcttgtc tttccgccta 1440 

agtactgaga attcaatcca agttctggag ctcccgggga ccacgcatga gtaccttttg 1500 

gaaggcctga aacctgacag tgtctacctg gttcggatta ctgctgccac cagagtgggg 1560 

ctgggagagt catcagtatg gacttcacat aggacgccca aagctacaag cgtgaaagcc 1620 

cctaagtctc cagagttgca tttggagcct ctgaactgta ccaccatttc tgtgaggtgg 1680 

cagcaagatg tagaggacac agctgctatt cagggctaca agctgtacta caaggaagaa 1740 

gggcagcagg agaatgggcc cattttcttg gataccaagg acctactcta tactctcagt 1800 

ggcttagacc ccagaagaaa atatcatgtg agactcctgg cttacaacaa catagacgat 1860 

ggctatcagg cagatcagac tgtcagcact ccaggatgcg tgtctgttcg tgatcgcatg 1920 

gtccctcctc caccaccacc ccaccatctc tatgcgaagg ctaacacctc atcttccatc 1980 
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ttcctgcact ggaggaggcc tgcattcacc gctgcacaaa tcattaacta caccatccgc 2040 
tgtaatcctg ttggcctgca gaatgcttct ttggttctgt accttcaaac atcagaaact 2100 
cacatgttgg ttcaaggtct agaaccaaac accaaatacg aatttgccgt tcgattacat 2160 
gtggatcagc tttccagtcc ttggagccct gtagtctacc attctactct tccagaagca 2220 

ccagcaggcc caccagttgg agtaaaagtg acattaatag aggatgacac tgccctggtt 2280 

tcttggaaac cccctgatgg cccagaaaca gttgtgaccc gctatactat cttatatgca 2340 

tctaggaagg cctggattgc aggagagtgg caggtcttac accgtgaagg ggcaataacc 2400 

atggctttgc tagaaaactt ggtagcagga aatgtgtaca ttgtcaagat atctgcatcc 2460 

aatgaggtgg gagaaggacc cttttcaaat tctgtggagc tggcagtact tccaaaggaa 2520 

acctctgaat caaatcagag gcccaagcgt ttagattctg ctgatgccaa agtttattca 2580 

ggatattacc atctggacca aaaatcaatg actggcattg ctgtaggtgt tggcatagcc 2640 

ttgacctgca tcctcatctg tgttctcatc ttgatatacc gaagtaaagc caggaaatca 2700 

tctgcttcca agacggcaca gaatggaact caacagttac ctcgtaccag tgcctcctta 2760 

gctagtggaa atgaggtagg aaagaacctg gaaggagctg taggaaatga agaatcttta 2820 

atgccaatga tcatgccaaa cagcttcatt gatgcaaagg tactgagctg cgggatttgc 2880 

tgcataagcc gttcttccat tcctcctccc tgtgtgtgta aaatgtactt cccccaaaat 2940 

tgtatgttga atgtattata ccaatactct tattaa 2976 

<210> SEQ ID NO 12 

<211> LENGTH: 991 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<4 00> SEQUENCE: 12 

Met Ser Glu Asn Lys Arg He Glu Val Leu Ser Asn Gly Ser Leu Tyr 
1 5 10 15 

He Ser Glu Val Glu Gly Arg Arg Gly Glu Gin Ser Asp Glu Gly Phe 
20 25 30 

Tyr Gin Cys Leu Ala Met Aen Lys Tyr Gly Ala lie Leu Ser Gin Lys 
35 40 45 

Ala His Leu Ala Leu Ser Thr He Ser Ala Phe Glu Val Gin Pro He 
50 55 60 

Ser Thr Glu Val His Glu Gly Gly Val Ala Arg Phe Ala Cys Lys He 
65 70 75 80 

Ser Ser His Pro Pro Ala Val He Thr Trp Glu Phe Asn Arg Thr Thr 
85 90 95 

Leu Pro Met Thr Met Asp Arg He Thr Ala Leu Pro Thr Gly Val Leu 
100 105 HO 

Gin He Tyr Asp Val Ser Gin Arg Asp Ser Gly Asn Tyr Arg Cys He 
115 120 125 

Ala Ala Thr Val Ala His Arg Arg Lys Ser Met Glu Ala Ser Leu Thr 
130 135 140 

Val He Pro Ala Lys Glu Ser Lys Ser Phe His Thr Pro Thr He He 
145 150 155 160 

Ala Gly Pro Gin Asn He Thr Thr Ser Leu His Gin Thr Val Val Leu 
165 170 175 

Glu Cys Met Ala Thr Gly Asn Pro Lys Pro He He Ser Trp Ser Arg 
180 185 190 
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Leu Asp His Lys Ser lie Asp Val Phe Asn Thr Arg Val Leu Gly Asn 
195 200 205 

Gly Asn Leu Met lie Ser Asp Val Arg Leu Gin His Ala Gly Val Tyr 
210 215 .220 

Val Cys Arg Ala Thr Thr Pro Gly Thr Arg Asn Phe Thr Val Ala Met 
225 230 235 240 

Ala Thr Leu Thr Val Leu Ala Pro Pro Ser Phe Val Glu Trp Pro Glu 
245 250 255 

Ser Leu Thr Arg Pro Arg Ala Gly Thr Ala Arg Phe Val Cys Gin Ala 
260 265 270 

Glu Gly He Pro Ser Pro Lys Met Ser Trp Leu Lys Asn Gly Arg Lys 
275 280 285 

He His Ser Asn Gly Arg He Lys Met Tyr Asn Ser Lye Leu Val He 
290 .295 300 

Asn Gin He He Pro Glu Asp Asp Ala He Tyr Gin Cys Met Ala Glu 
305 310 315 320 

Asn Ser Gin Gly Ser He Leu Ser Arg Ala Arg Leu Thr Val Val Met 
325 330 335 

Ser Glu Asp Arg Pro Ser Ala Pro Tyr Asn Val His Ala Glu Thr Met 
340 345 350 

Ser Ser Ser Ala He Leu Leu Ala Trp Glu Arg Pro Leu Tyr Asn Ser 
355 360 365 

Asp Lys Val He Ala Tyr Ser Val His Tyr Met Lys Ala Glu Gly Leu 
370 375 380 

Asn Asn Glu Glu Tyr Gin Val Val He Gly Asn Asp Thr Thr His Tyr 
385 390 3 95 * 4Q0 

He He Asp Asp Leu Glu Pro Ala Ser Asn Tyr Thr Phe Tyr He Val 
405 410 415 

Ala Tyr Met Pro Met Gly Ala Ser Gin Met Ser Asp His Val Thr Gin 
420 425 430 

Asn Thr Leu Glu Asp Val Pro Leu Arg Pro Pro Glu He Ser Leu Thr 
435 440 445 

Ser Arg Ser Pro Thr Asp He Leu He Ser Trp Leu Pro He Pro Ala 
450 455 460 

Lys Tyr Arg Arg Gly Gin Val Val Leu Tyr Arg Leu Ser Phe Arg Leu 
465 470 475 480 

Ser Thr Glu Asn Ser He Gin Val Leu Glu Leu Pro Gly Thr Thr His 
485 490 49 S 

Glu Tyr Leu Leu Glu Gly Leu Lys Pro Asp Ser Val Tyr Leu Val Arg 
500 505 sic 

He Thr Ala Ala Thr Arg Val Gly Leu Gly Glu Ser Ser Val Trp Thr 
515 520 525 

Ser His Arg Thr Pro Lys Ala Thr Ser Val Lys Ala Pro Lys Ser Pro 
530 535 540 

Glu Leu His Leu Glu Pro Leu Asn Cys Thr Thr He Ser Val Arg Trp 
545 550 555 560 

Gin Gin Asp Val Glu Asp Thr Ala Ala He Gin Gly Tyr Lys Leu Tyr 
565 570 575 

Tyr Lys Glu Glu Gly Gin Gin Glu Asn Gly Pro He Phe Leu Asp Thr 
5Q 0 585 590 

Lys Asp Leu Leu Tyr Thr Leu Ser Gly Leu Asp Pro Arg Arg Lys Tyr 
595 600 605 

His Val Arg Leu Leu Ala Tyr Asn Asn He Asp Asp Gly Tyr Gin Ala 
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610 615 



620 



Asp Gin Thr Val Ser Thr Pro Gly Cys Val Ser Val Arg Asp Arg Met 
625 630 635- 640 

Val Pro Pro Pro Pro Pro Pro Hie His Leu Tyr Ala Lys Ala Asn Thr 
645 650 655 

Ser Ser Ser He Phe Leu Hie Trp Arg Arg Pro Ala Phe Thr Ala Ala 
660 665 670 

Gin lie He Asn Tyr Thr lie Arg Cys Asn Pro Val Gly Leu Gin Asn 
675 680 685 

Ala Ser Leu Val Leu Tyr Leu Gin Thr Ser Glu Thr His Met Leu Val 
690 695 700 

Gin Gly Leu Glu Pro Asn Thr Lys Tyr Glu Phe Ala Val Arg Leu His 
705 710 715 ?20 

Val Asp Gin Leu Ser Ser Pro Trp Ser Pro Val Val Tyr Hie Ser Thr 
725 73o ?35 

Leu Pro Glu Ala Pro Ala Gly Pro Pro Val Gly Val Lys Val Thr Leu 
740 745 750 

He Glu Asp Asp Thr Ala Leu Val Ser Trp Lys Pro Pro Asp Gly Pro 
755 760 765 

Glu Thr Val Val Thr Arg Tyr Thr He Leu Tyr Ala Ser Arg Lys Ala 
770 775 780 

Trp He Ala Gly Glu Trp Gin Val Leu His Arg Glu Gly Ala He Thr 
785 790 795 800 

Met Ala Leu Leu Glu Asn Leu Val Ala Gly Asn Val Tyr He Val Lys 
805 810 815 

He Ser Ala Ser Asn Glu Val Gly Glu Gly Pro Phe Ser Asn Ser Val 
820 825 830 

Glu Leu Ala Val Leu Pro Lys Glu Thr Ser Glu Ser Asn Gin Arg Pro 
835 840 845 

Lys Arg Leu Asp Ser Ala Asp Ala Lys Val Tyr Ser Gly Tyr Tyr His 
850 855 860 

Leu Asp Gin Lys Ser Met Thr Gly He Ala Val Gly Val Gly He Ala 
865 870 875 880 

Leu Thr Cya He Leu He Cys Val Leu He Leu He Tyr Arg Ser Lys 
885 8 9o 895 

Ala Arg Lys Ser Ser Ala Ser Lys Thr Ala Gin Asn Gly Thr Gin Gin 
900 90S 910 

Leu Pro Arg Thr Ser Ala Ser Leu Ala Ser Gly Asn Glu Val Gly Lys 
915 920 925 

Asn Leu Glu Gly Ala Val Gly Asn Glu' Glu Ser Leu Met Pro Met He 
930 935 940 

Met Pro Asn Ser Phe He Aep Ala Lys Val Leu Ser Cys Gly He Cys 
43 950 955 960 

Cys He Ser Arg Ser Ser He Pro Pro Pro Cys Val Cys Lys Met Tyr 
9fi 5 970 975 

Phe Pro Gin Asn Cys Met Leu Asn Val Leu Tyr Gin Tyr Ser Tvr 
980 985 990 * 



<210> SEQ ID NO 13 

<211> LENGTH: 909 

<212> TYPE: DNA 

<213> ORGANISM: homo eapienB 



<400> SEQUENCE! 13 
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atgtctgaaa ataaacggat cgaggttctt tctaacggct ctttatacat cagtgaggtg 60 

gaaggcaggc gaggagagca gtccgatgaa ggattttatc agtgcttggc aatgaacaaa 120 

tatggagcca ttcttagtca aaaagctcat cttgccttat caactatttc tgcatttgaa .180 

gtccagccaa tttccactga ggtccacgaa ggtggagttg ctcgatttgc atgcaagatt 240 

tcatcccacc ctcctgcagt cataacatgg gagttcaatc ggacaactct acctatgact 300 

atggacagga taactgccct accaacagga gtattgcaga tctatgatgt cagccaaagg 360 

gattctggaa attatcgttg tattgctgcc actgtagccc accgacgtaa aagtatggag 420 

gcctcgctaa ctgtgattcc agctaaggag tcaaaatcct tccacacacc arcaattata 480 

gcaggtccac agaacataac aacatctctt catcagactg tagttttgga atgcatggcc 540 

acaggaaatc ccaaaccaat catttcttgg agccgccttg atcacaaatc cattgatgtc 600 

tttaatactc gggtacttgg aaatggtaat ctcatgatat ctgatgtcag gctacaacat 660 

gctggagtat atgtttgtcg ggccactacc cctggcacac gcaactttac agttgctatg 720 

gcaactttaa ctgtattagc tcctccttca tttgttgaat ggccagaaag tttaacaagg 780 

cctcgagctg gcactgctcg atttgtgtgt caggcagaag gaatcccctc tcccaagatg 84 0 

tcatggttga aaaatggaag gaagatacat tcgaatggta gaattaaaat gtacaacagg 900 
tttaaataa 



909 



<210> SEQ ID NO 14 

<211> LENGTH: 302 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 14 

Met Ser Glu Asn Lys Arg lie Glu Val Leu Ser Asn Gly Ser Leu Tyr 
1 5 io is 

He Ser Glu Val Glu Gly Arg Arg Gly Glu Gin Ser Asp Glu Gly Phe 
20 25 30 

Tyr Gin Cys Leu Ala Met Asn Lys Tyr Gly Ala He Leu Ser Gin Lys 
35 40 45 

Ala His Leu Ala Leu Ser Thr He Ser Ala Phe Glu Val Gin Pro He 
50 55 60 

Ser Thr Glu Val His Glu Gly Gly Val Ala Arg Phe Ala Cys Lys He 
65 70 75 80 

Ser Ser His Pro Pro Ala Val He Thr Trp Glu Phe Asn Arg Thr Thr 
85 90 95 

Leu Pro Met Thr Met Asp Arg He Thr Ala Leu Pro Thr Gly Val Leu 
100 .105 HO 

Gin He Tyr Asp Val Ser Gin Arg Asp Ser Gly Asn Tyr Arg Cys He 
115 120 125 

Ala Ala Thr Val Ala His Arg Arg Lys Ser Met Glu Ala Ser Leu Thr 
130 135 140 

Val He Pro Ala Lys Glu Ser Lys Ser Phe His Thr Pro Thr He He 
I 45 150 155 160 

Ala Gly Pro Gin Asn He Thr Thr Ser Leu His Gin Thr Val Val Leu 
165 170 175 

Glu Cys Met Ala Thr Gly Asn Pro Lys Pro He He Ser Trp Ser Arg 
180 185 190 

Leu Asp His Lys Ser He Asp Val Phe Asn Thr Arg Val Leu Gly Asn 
195 200 205 
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Gly Asn Leu Met lie Ser Asp Val Arg Leu Gin Hie Ala Gly Val Tyr 
210 215 220 

Val Cys Arg Ala Thr Thr Pro Gly Thr Arg Asn Phe Thr Val Ala Met 
225 - 230 235 240 

Ala Thr Leu Thr Val Leu Ala Pro Pro Ser Phe Val Glu Trp Pro Glu 
245 250 255 

Ser Leu Thr Arg Pro Arg Ala Gly Thr Ala Arg Phe Val Cys Gin Ala 
2S0 265 270 

Glu Gly He Pro Ser Pro Lys Met Ser Trp Leu Lys Asn Gly Arg Lys 
275 280 285 

He His Ser Asn Gly Arg He Lys Met Tyr Asn Arg Phe Lys 
290 295 . 300 

<210> SEQ ID NO 15 

<211> LENGTH: 2481 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 15 

atgtctgaaa ataaacggat cgaggttctt tctaacggct ctttatacat cagtgaggtg 60 

gaaggcaggc gaggagagca gtccgatgaa ggattttatc agtgcttggc aatgaacaaa 120 

tatggagcca ttcttagtca aaaagctcat cttgccttat caactatttc tgcatttgaa 180 

gtccagccaa tttccactga ggtccacgaa ggtggagttg ctcgatttgc atgcaagatt 240 

tcatcccacc ctcctgcagt cataacatgg gagttcaatc ggacaactct acctatgact 300 

atggacagga taactgccct accaacagga gtattgcaga tctatgatgt cagccaaagg 360 

gattctggaa attatcgttg tattgctgcc actgtagccc accgacgtaa aagtatggag 420 

gcctcgctaa ctgtgattcc agctaaggag tcaaaatcct tccacacacc arcaattata 480 

gcaggtccac agaacataac aacatctctt catcagactg tagttttgga atgcatggcc 540 

acaggaaatc ccaaaccaat catttcttgg agccgccttg atcacaaatc cattgatgtc 600 

tttaataotc gggtacttgg aaatggtaat ctcatgatat ctgatgtcag gctacaacat 660 

gctggagtat atgtttgtcg ggccactacc cctggcacac gcaactttac agttgctatg 720 

gcaactttaa ctgtattagc tcctccttca tttgttgaat ggccagaaag tttaacaagg 780 
cctcgagctg gcactgctcg atttgtgtgt caggcagaag gaatcccctc tcccaagatg 



840 



tcatggttga aaaatggaag gaagatacat tcgaatggta gaattaaaat gtacaacagt 900 

aaattggtaa ttaaccagat tattcctgaa gatgatgcta tttatcagtg catggctgag 960 

aatagccaag gatctatttt atctagagcc agactgactg tagtgatgtc agaagacaga 1020 

cccagtgctc cctataatgt acatgctgaa accatgtcaa gctcagccat tcttttagcc 1080 

tgggagaggc cactttataa ttcagacaaa gtcattgcct attctgtaca ctacatgaaa U40 

gcagaaggtt taaataatga agagtatcaa gtagtcatcg gaaatgacac aactcattat 1200 

attattgatg acttagagcc tgccagcaat tatactttct acattgtagc atatatgcca 1260 

atgggagcca gccagatgtc tgaccatgtg acacagaata ctctagagga tgaccccaga 1320 

agaaaatatc atgtgagact cctggcttac aacaacatag acgatggcta tcaggcagat 1380 

cagactgtca gcactccagg atgcgtgtct gttcgtgatc gcatggtccc tcctccacca . 1440 

ccaccccacc atctctatgc gaaggctaac acctcatctt ccatcttcct gcactggagg 1500 

aggcctgcat tcaccgctgc acaaatcatt aactacacca tccgctgtaa tcctgttggc 1560 

ctgcagaatg cttctttggt tctgtacctt caaacatcag aaactcacat gttggttcaa 1620 
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<210> SEQ ID NO 16 

<211> LENGTH: 826 

<212> TYPE: PUT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 16 

Met Ser Glu Asn Lye Arg He Glu Val Leu Ser Asn Gly Ser Leu Tyr 

He Ser Glu Val Glu Gly Arg Arg Gly Glu Gin Ser Asp Glu Gly Phe 
20 25 30 

Tyr Gin Cys Leu Ala Met Asn Lys Tyr Gly Ala He Leu Ser Gin Lys 
35 40 45 

Ala His Leu Ala Leu Ser Thr He Ser Ala Phe Glu Val Gin Pro He 
50 55 60 

Ser Thr Glu Val His Glu Gly Gly Val Ala Arg Phe Ala Cys Lys He 
« 70 75 



80 



Ser Ser His Pro Pro Ala Val He Thr Trp Glu Phe Asn Arg Thr Thr 
85 go 95 

Leu Pro Met Thr Met Asp Arg He Thr Ala Leu Pro Thr Gly Val Leu 
100 105 110 

Gin lie Tyr Asp Val Ser Gin Arg Asp Ser Gly Asn Tyr Arg Cys He 
1X5 120 125 

Ala Ala Thr Val Ala His Arg Arg Lys Ser Met Glu Ala Ser Leu Thr 
130 135 140 

Val He Pro Ala Lys Glu Ser Lys Ser Phe His Thr Pro Thr He He 
145 150 155 iso 

Ala Gly Pro Gin Asn He Thr Thr Ser Leu His Gin Thr Val Val Leu 
165 170 175 

Glu Cys Met Ala Thr Gly Asn Pro Lys Pro He He Ser Trp Ser Arg 
I 80 185 190 

Leu Asp His Lys Ser He Asp Val Phe Asn Thr Arg Val Leu Gly Asn 
195 200 205 

Gly Asn Leu Met He Ser Asp Val Arg Leu Gin His Ala Gly Val Tyr 



1680 



ggtctagaac caaacaccaa atacgaattt gccgttcgat tacatgtgga tcagctttcc 

agtccttgga gccctgtagt ctaccattct actcttccag aagcaccagc aggcccacca 1740 

gttggagtaa aagtgacatt aatagaggat gacactgccc tggtttcttg gaaaccccct 1800 

gatggcccag aaacagttgt gacccgctat actatcttat atgcatctag gaaggcctgg 1860 

attgcaggag agtggcaggt cttacaccgt gaaggggcaa taaccatggc tttgctagaa 1920 

aacttggtag caggaaatgt gtacattgtc aagatatctg catccaatga ggtgggagaa 1980 

ggaccctttt caaattctgt ggagctggca gtacttccaa aggaaacctc tgaatcaaat 2040 

cagaggccca agcgtttaga ttctgctgat gccaaagttt attcaggata ttaccatctg 2100 

gaccaaaaat caatgactgg cattgctgta ggtgttggca tagccttgac ctgcatcctc 2160 

atctgtgttc tcatcttgat ataccgaagt aaagccagga aatcatctgc ttccaagacg 2220 

gcacagaatg gaactcaaca g.ttacctcgt accagtgcct ccttagctag tggaaatgag 2280 

gtaggaaaga acctggaagg agctgtagga aatgaagaat ctttaatgcc aatgatcatg 2340 

ccaaacagct tcattgatgc aaaggtactg agctgcggga tttgctgcat aagccgttct 2400 

tccattcctc ctccctgtgt gtgtaaaatg tacttccccc aaaattgtat gttgaatgta 2460 
ttataccaat actcttatta a 



2481 



63 
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215 



-continued 

220 



Val Cys Arg Ala Thr Thr Pro Gly Thr Arg Asn Phe Thr v. I Ala Met 



235 



240 



Ala Thr Leu Thr Val Leu Ala Pro Pro Ser Phe Val Glu Trp Pro Glu 
245 250 255 

Ser Leu Thr Arg Pro Arg Ala Gly Thr Ala Arg Phe Val Cys Gin Ala 

265 270 

Glu Gly lie Pro Ser Pro Lys Met Ser Trp Leu Lys Asn Gly Arg Lys 

280 285 

He His Ser Asn Gly Arg lie Lye Met Tyr Asn Ser Lys Leu Val lie 

295 300 

Asn Gin lie lie Pro Glu Asp Asp Ala lie Tyr Gin Cys Met Ala Glu 
310 .315 3 2 o 

Aen Ser Gin Gly Ser lie Leu Ser Arg Ala Arg Leu Thr Val 



Ser Glu Asp Arg Pro Ser Ala Pro Tyr Aen Val His Ala Glu 

340 



Val Met 
335 



Thr Met 



345 35 0 

Ser Ser Ser Ala lie Leu Leu Ala Trp Glu Arg Pro Leu Tyr Asn Ser 
J55 360 365 

Asp Lys Val He Ala Tyr Ser Val His Tyr Met Lys Ala Glu Gly Leu 
375 380 



385 ° 1U GlU ^ Val Val Ile ^n Asp Thr Thr His Tyr 

390 395 400 

He lie Asp Asp Leu Glu Pro Ala Ser Asn Tyr Thr Phe Tyr He Val 

.410 415 
Ala Tyr Met Pro Met Gly Ala Ser Gin Met Ser Asp His Val Thr Gin 



425 



430 



Asn Thr Leu Glu Asp Asp Pro Arg Arg Lys Tyr His Val 



445 



Arg Leu Leu 



Ala Tyr Asn Asn lie Asp Asp Gly Tyr Gin Ala Asp Gin Thr Val Ser 



460 



Thr Pro Gly Cy6 Val Ser Val Arg Asp Arg Met Val Pro Pro Pro Pro 



470 



4 75 



480 



Pro Pro His His Leu Tyr Ala Lys Ala Asn Thr Ser Ser Ser lie Phe 
485 



495 



Leu His Trp Arg Arg Pro Ala Phe Thr Ala Ala Gin He He Asn Tyr 

500 505 510 

Thr He Arg Cys Asn Pro Val Gly Leu Gin Asn Ala Ser Leu Val Leu 



525 



Tyr Leu Gin Thr Ser Glu Thr His Met Leu Val Gin Gly Leu Glu Pro 



535 



540 



Asn Thr Lys Tyr Glu Phe Ala Val Arg Leu His Val Asp Gin Leu Ser 

555 560 
Ser Pro Trp Ser Pro Val Val Tyr His Ser Thr Leu Pro Glu Ala Pro 



570 



575 



Ala Gly Pro Pro Val Gly Val Lys Val Thr Leu He Glu Asp Asp Thr 



585 



590 



Ala Leu Val Ser Trp Lys Pro Pro Aep Gly Pro Glu Thr Val Val Thr 



600 



605 



Arg Tyr ,hr lie Leu Tyr Ale Ser Arg Lys Ala Trp lie Ala Gly Glu 

6 15 



620 



Trp Gin Val Leu His Arg Glu Gly Ala lie Thr Met Ala Leu Leu Glu 



635 



65 
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Glu Val Gly Glu Gly Pro Phe Ser Asn Ser Val Glu Leu Ala Val Leu 

Pro Lye Glu Thr Ser Glu Ser Asn Gin Arg Pro Lye Arg Leu Aep Ser 

Ala Aep Ala Lys Val Tyr Ser Gly Tyr Tyr His Leu Aep Gin Lye Ser 

695 700 
Met Thr Gly He A la Val Gly Val Gly lie Ala Leu Thr Cys lie Leu 
710 '15 720 

He eye Val Leu lie Leu He Tyr Arg Ser Lye Ala Arg Lye Ser Ser 

730 735 

Ala ser Lye Thr Ala Gin Aen Gly Thr Gin Gin Leu Pro Arg Thr Ser 

745 750 
Ala Ser Leu Ala Ser Gly Asn Glu Val Gly Lye Aen Leu Glu Gly Ala 

760 765 
Val Gly Asn Glu Glu ser Leu Het Pro Met He Met Pro Aen Ser Phe 

775 780 
lie Asp Ala Lys Val Leu Ser Cye Gly He Cys Cys He Ser Arg Ser 

795 800 
Ser He Pro Pro Pro Cys Val Cys Lys Het Tyr Phe Pro Gin Asn Cys 

810 815 

Met Leu Asn Val Leu Tyr Gin Tyr Ser Tyr 

820 . 825. 



<210> SEQ ID NO 17 

<211> LENGTH: 3219 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE : 17 

atgtctgaaa ataaaeggat cgaggttctt 

gaaggcaggc gaggagagca gtccgatgaa 

tatggageca ttcttagtca aaaagctcat 

gtccagccaa tttccactga ggtccacgaa 

tcatcccacc ctcctgcagt cataacatgg 

atggacagga taactgccct accaacagga 

gattctggaa attatcgttg tattgetgee 

gcctcgctaa ctgtgattcc agctaaggag 

gcaggtccac agaacataac aacatctctt 

acaggaaatc ccaaaccaat catttcttgg 

tttaatactc.gggtacttgg aaatggtaat 

gctggagtat atgtttgtcg ggccactacc 

gcaactttaa ctgtattagc tcctccttca 

cctcgagctg gcactgctcg atttgtgtgt 

tcatggttga aaaatggaag gaagatacat 

aaattggtaa ttaaccagat tattcctgaa 

aatagccaag gatctatttt atetagagee 



tetaaegget 
ggattttatc 
ettgecttat 
ggtggagttg 
gagttcaatc 
gtattgeaga 
actgtagccc 
tcaaaatcct 
catcagactg 
agccgccttg 
ctcatgatat 
cctggcacac 
tttgttgaat 
caggcagaag 
tcgaatggta 
gatgatgeta 
agactgactg 



ctttatacat 
agtgcttggc 
caactatttc 
etcgatttge 
ggacaactct 
tctatgatgt 
accgaegtaa 
tccacacacc 
tagttttgga 
atcacaaatc 
ctgatgtcag 
gcaactttac 
ggccagaaag 
gaatcccctc 
gaattaaaat 
tttatcagtg 
tagtgatgtc 



cagtgaggtg 
aatgaacaaa 
tgcatttgaa 
atgeaagatt 
acctatgact 
cagecaaagg 
aagtatggag 
arcaattata 
atgcatggcc. 
cattgatgtc 
gctacaacat 
agttgctatg 
tttaacaagg 
tcccaagatg 
gtacaacagt 
catggctgag 
agaagacaga 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
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cccagtgctc cctataatgt acatgctgaa accatgtcaa gctcagccat tcttttagcc 1080 
tgggagaggc cactttataa ttcagacaaa gtcattgcct attctgtaca ctacatgaaa 1140 



atgggagcca gccagatgtc tgaccatgtg acacagaata ctctagagga tgttcccctg 1320 

agacctcctg aaattagttt gacaagtcga agtcccactg atattctcat ctcctggctg 1380 

ccaatcccag ccaaatatcg gcggggccaa gtggtgctgt atcgcttgtc tttccgccta 1440 

agtactgaga attcaatcca agttctggag ctcccgggga ccacgcatga gtaccttttg 1500 

gaaggcctga aacctgacag tgtctacctg gttcggatta ctgctgccac cagagtgggg 1560 

ctgggagagt catcagtatg gacttcacat aggacgccca aagctacaag cgtgaaagcc 1620 

cctaagtctc cagagttgca tttggagcct ctgaactgta ccaccatttc tgtgaggtgg 1680 

cagcaagatg tagaggacac agctgctatt cagggctaca agctgtacta caaggaagaa 1740 

gggcagcagg agaatgggcc cattttcttg gataccaagg acctactcta tactctcagt 1800 

ggcttagacc ccagaagaaa atatcatgtg agactcctgg cttacaacaa catagacgat 1B60 

ggctatcagg cagatcagac tgtcagcact ccaggatgcg tgtctgttcg tgatcgcatg 1920 

gtccctcctc caccaccacc ccaccatctc tatgcgaagg ctaacacctc atcttccatc 1980 

ttcctgcact ggaggaggcc tgcattcacc gctgcacaaa tcattaacta caccatccgc 2040 

tgtaatcctg ttggcctgca gaatgcttct ttggttctgt accttcaaac atcagaaact 2100 

cacatgttgg ttcaaggtct agaaccaaac accaaatacg aatttgccgt tcgattacat 2160 

gtggatcagc tttccagtcc ttggagccct gtagtctacc attctactct tccagaagca 2220 

ccagcaggcc caccagttgg agtaaaagtg acattaatag aggatgacac tgccctggtt 2280 

tcttggaaac cccctgatgg cccagaaaca gttgtgaccc gctatactat cttatatgca 2340 

tctaggaagg cctggattgc aggagagtgg caggtcttac accgtgaagg ggcaataacc 2400 

atggctttgc tagaaaactt ggtagcagga aatgtgtaca ttgtcaagat atctgcatcc 2460 

aatgaggtgg gagaaggacc cttttcaaat tctgtggagc tggcagtact tccaaaggaa 2520 

acctctgaat caaatcagag gcccaagcgt ttagattctg ctgatgccaa agtttattca 2580 

ggatattacc atctggacca aaaatcaatg actggcattg ctgtaggtgt tggcatagcc 264 0 

ttgacctgca tcctcatctg tgttctcatc ttgatatacc gaagtaaagc caggaaatca 2700 

tctgcttcca agacggcaca gaatggaact caacagttac ctcgtaccag tgcctcctta 2760 

gctagtggaa atgaggtagg aaagaacctg gaaggagctg taggaaatga agaatcttta 2820 

atgccaatga tcatgccaaa cagcttcatt gatgcaaagg gaggaactga cctgataatt .2880 

aatagctatg gtcctataat taaaaacaac tctaagaaaa agtggttttt tttccaagac ^2.940 

tcaaagaaga tacaagttga gcagcctcaa agaagattta ctccagcggt ctgcttttac 3000 

cagccaggca ccactgtatt aatcagtgat gaagactccc ctagctcccc aggtcagaca 3060 

accagcttct caagaccctt tggtgttgca gctgatacag aacattcagc aaatagtgaa 3120 

ggcagccatg agactgggga ttctgggcgg ttttctcatg agtccaacga tgagatacat 3180 

ctgtcctcag ttataagtac cacacccccc aacctctga 3219 

<210> SEQ ID NO 18 

<211> LENGTH: 1072 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 



'. gcagaaggtt taaataatga agagtatcaa gtagtcatcg gaaatgacac aactcattat 



1200 



attattgatg acttagagcc tgccagcaat tatactttct acattgtagc atatatgcca 



1260 
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<400> SEQUENCE : 18 

Met Ser Glu Asn Lys Arg lie Glu Val Leu Ser Asn Gly Ser Leu Tyr 
1 5 10 15 

lie Ser Glu Val Glu Gly Arg Arg Gly Glu Gin Ser Asp Glu Gly Phe 
20 25 30 

Tyr Gin Cys Leu Ala Met Asn Lys Tyr Gly Ala lie Leu Ser Gin Lys 
35 40 45 

Ala His Leu Ala Leu Ser Thr He Ser Ala Phe Glu Val Gin Pro He 
50 55 60 

Ser Thr Glu Val His Glu Gly Gly Val Ala Arg Phe Ala Cys Lys He 
65 70 75 80 

Ser Ser His Pro Pro Ala Val He Thr Trp Glu Phe Asn Arg Thr Thr 
85 90 95 

Leu Pro Met Thr Met Asp Arg lie Thr Ala Leu Pro Thr Gly Val Leu 
100 105 110 

Gin He Tyr Asp Val Ser Gin Arg Asp Ser Gly Asn Tyr Arg Cys He 
115 120 125 

Ala Ala Thr Val Ala His Arg Arg Lys Ser Met Glu Ala Ser Leu Thr 
130 135 140 

Val He Pro Ala Lys Glu Ser Lys Ser Phe His Thr Pro Thr He He 
145 150 155 160 

Ala Gly Pro Gin Asn He Thr Thr Ser Leu His Gin Thr Val Val Leu 
165 170 175 

Glu Cys Met Ala Thr Gly Asn Pro Lys Pro He He Ser Trp Ser Arg 
180 185 "190 

Leu Asp His Lys Ser He Asp Val Phe Asn Thr Arg Val Leu Gly Asn 
195 200 205 

Gly Asn Leu Met He Ser Asp Val Arg Leu Gin His Ala Gly Val Tyr 
210 215 220 

Val Cys Arg Ala Thr Thr Pro Gly Thr Arg Asn Phe Thr Val Ala Met 
225 230 235 240 

Ala Thr Leu Thr Val Leu Ala Pro Pro Ser Phe Val Glu Trp Pro Glu 
245 250 255 

Ser Leu Thr Arg Pro Arg Ala Gly Thr Ala Arg Phe Val Cya Gin Ala 
260 265 270 

Glu Gly He Pro Ser Pro Lys Met Ser Trp Leu Lys Asn Gly Arg Lys 
275 280 285 

He His Ser Asn Gly Arg lie Lys Met Tyr Asn Ser Lys Leu Val He 
290 295 300 

Asn Gin He lie Pro Glu Asp Asp Ala He Tyr Gin Cys Met Ala Glu 
305 . 310 315 320 

Asn Ser Gin Gly Ser He Leu Ser Arg Ala Arg Leu Thr Val Val Met 
325 330 335 

Ser Glu Asp Arg Pro Ser Ala Pro Tyr Asn Val His Ala Glu Thr Met 
340 345 350 

Ser Ser Ser Ala He Leu Leu Ala Trp Glu Arg Pro Leu Tyr Asn Ser 
355 360 36S 

Asp Lys Val He Ala Tyr Ser Val His Tyr Met Lys Ala Glu Gly Leu 
370 375 380 

Asn Asn Glu Glu Tyr Gin Val Val lie Gly ABn Asp Thr Thr His Tyr 
385 390 395 400 

He lie Asp Asp Leu Glu Pro Ala Ser Asn Tyr Thr Phe Tyr lie Val 
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405 



410 



415 



Ala Tyr Met Pro Met Gly Ala Ser Gin Met Ser Asp Hie Val Thr Gin 
420 425 430 

Asn Thr Leu Glu Asp Val Pro Leu Arg Pro Pro Glu lie Ser Leu Thr 
435 440 445 

Ser Arg Ser Pro Thr Asp lie Leu lie Ser Trp Leu Pro lie Pro Ala 
450 455 460 

Lys Tyr Arg Arg Gly Gin Val Val Leu Tyr Arg Leu Ser Phe Arg Leu 
465 470 475 480 

Ser Thr Glu Asn Ser He Gin Val Leu Glu Leu Pro Gly Thr Thr His 
485 490 495 

Glu Tyr Leu Leu Glu Gly Leu Lys Pro Asp Ser Val Tyr Leu Val Arg 
500 505 510 

He Thr Ala Ala Thr Arg Val Gly Leu Gly Glu Ser Ser Val Trp Thr 
515 520- 525 

Ser His Arg Thr Pro Lys Ala Thr Ser Val Lys Ala Pro Lys Ser Pro 
530 535 540 

Glu Leu His Leu Glu Pro Leu Asn Cys Thr Thr He Ser Val Arg Trp 
545 550 555 560 

Gin Gin Asp Val Glu Asp Thr Ala Ala lie Gin Gly Tyr Lys Leu Tyr 
565 570 575 

Tyr Lys Glu Glu Gly Gin Gin Glu Asn Gly Pro He Phe Leu Asp Thr 
580 585 590 

Lys Asp Leu Leu Tyr Thr Leu Ser Gly Leu Asp Pro Arg Arg Lys Tyr 
595 600 605 

His Val Arg Leu Leu Ala Tyr Asn Asn He Asp Asp Gly Tyr Gin Ala 
610 615 620 

Asp Gin Thr Val Ser Thr Pro Gly Cys Val Ser Val Arg Asp Arg Met 
625 630 635 640 

Val Pro Pro Pro Pro Pro Pro His His Leu Tyr Ala Lys Ala Asn Thr 
645 650 655 

Ser Ser Ser He Phe Leu His Trp Arg Arg Pro Ala Phe Thr Ala Ala 
660 665 670 

Gin He He Asn Tyr Thr He Arg Cys Asn Pro Val Gly Leu Gin Asn 
675 680 685 

Ala Ser Leu Val Leu Tyr Leu Gin Thr Ser Glu Thr His Met Leu Val 
690 695 700 

Gin Gly Leu Glu Pro Asn Thr Lys Tyr Glu Phe Ala Val Arg Leu His 
705 710 715 720 

Val Asp Gin Leu Ser Ser Pro Trp Ser Pro Val Val Tyr His Ser Thr 
725 730 735 

Leu Pro Glu Ala Pro Ala Gly Pro Pro Val Gly Val Lys Val Thr Leu 



He Glu Asp Asp Thr Ala Leu Val Ser Trp Lys Pro Pro Asp Gly Pro 
755 760 765 

Glu Thr Val Val Thr Arg Tyr Thr He Leu Tyr Ala Ser Arg Lys Ala 
770 775 780 

Trp He Ala Gly Glu Trp Gin Val Leu His Arg Glu Gly Ala He Thr 
785 790 795 800 

Met Ala Leu Leu Glu Asn Leu Val Ala Gly Asn Val Tyr He Val Lys 
805 810 815 

He Ser Ala Ser Asn Glu Val Gly Glu Gly Pro Phe Ser Asn Ser Val 



740 



745 



750 



820 



825 



830 
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Glu Leu Ala Val Leu Pro Lys Glu Thr Ser Glu Ser Asn Gin Arg Pro 
835 840 845 

Lys Arg Leu Asp Ser Ala Asp Ala Lys Val Tyr Ser Gly Tyr Tyr Hie 
850 855 860 

Leu Asp Gin Lys Ser Met Thr Gly He Ala Val Gly Val Gly He Ala 
865 870 875 880 

Leu Thr Cys He Leu He Cys Val Leu He Leu He Tyr Arg Ser Lys 
885 890 895 

Ala Arg Lys Ser Ser Ala Ser Lys Thr Ala Gin Asn Gly Thr Gin Gin 
900 905 910 

Leu Pro Arg Thr Ser Ala Ser Leu Ala Ser Gly Asn Glu Val Gly Lys 
915 920 925 

ABn Leu Glu Gly Ala Val Gly Asn Glu Glu Ser Leu Met Pro Met He 
930 935 940 

Met Pro Asn Ser Phe He Asp Ala Lys Gly Gly Thr Asp Leu He He 
945 950 955 960 

ABn Ser Tyr Gly Pro He He Lys Asn Asn Ser Lys Lys Lys Trp Phe 
965 970 975 

Phe Phe Gin Asp Ser Lys Lys He Gin Val Glu Gin Pro Gin Arg Arg 
980 985 990 

Phe Thr Pro Ala Val Cys Phe Tyr Gin Pro Gly Thr Thr Val Leu He 
995 1000 1005 

Ser Asp Glu Asp Ser Pro Ser Ser Pro Gly Gin Thr Thr Ser Phe Ser 
1010 1015 1020 

Arg Pro Phe Gly Val Ala Ala Asp Thr Glu His Ser Ala Asn Ser Glu 
1025 1030 1035 1040 

Gly Ser His Glu Thr Gly Asp Ser Gly Arg Phe Ser His Glu Ser Asn 
1045 1050 1055 

Asp Glu He His Leu Ser Ser Val He Ser Thr Thr Pro Pro Asn Leu 
1060 1065 1070 



<210> SEQ ID NO 19 

<211> LENGTH: 2724 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 19 

atgtctgaaa ataaacggat cgaggttctt tctaacggct ctttatacat cagtgaggtg 60 

gaaggcaggc gaggagagca gtccgatgaa ggattttatc agtgcttggc aatgaacaaa 120 

tatggagcca ttcttagtca aaaagctcat cttgccttat caactatttc tgcatttgaa 180 

gtccagccaa tttccactga ggtccacgaa ggtggagttg ctcgatttgc atgcaagatt 240 

tcatcccacc ctcctgcagt cataacatgg gagttcaatc ggacaactct acctatgact > . 300 

atggacagga taactgccct accaacagga gtattgcaga tctatgatgt cagccaaagg 360 

gattctggaa attatcgttg tattgctgcc actgtagccc accgacgtaa aagtatggag 420 

gcctcgctaa ctgtgattcc agctaaggag tcaaaatcct tccacacacc arcaattata 480 

gcaggtccac agaacataac aacatctctt catcagactg tagttttgga atgcatggcc 540 

acaggaaatc ccaaaccaat catttcttgg agccgccttg atcacaaatc cattgatgtc 600 

tttaatactc gggtacttgg aaatggtaat ctcatgatat ctgatgtcag gctacaacat 660 

gctggagtat atgtttgtcg ggccactacc cctggcacac gcaactttac agttgctatg 720 

gcaactttaa ctgtattagc tcctccttca tttgttgaat ggccagaaag tttaacaagg 780 
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cctcgagctg gcactgctcg atttgtgtgt caggcagaag gaatcccctc tcccaagatg B40 

tcatggttga aaaatggaag gaagatacat tcgaatggta gaattaaaat gtacaacagt. 900 

aaattggtaa ttaaccagat tattcctgaa gatgatgcta tttatcagtg catggctgag 960 

aatagccaag gatctatttt atctagagcc agactgactg tagtgatgtc agaagacaga 1020 

cccagtgctc cctataatgt acatgctgaa accatgtcaa gctcagccat tcttttagcc 1080 

tgggagaggc cactttataa ttcagacaaa gtcattgcct attctgtaca ctacatgaaa 1140 

gcagaaggtt taaataatga agagtatcaa gtagtcatcg gaaatgacac aactcattat 1200 

attattgatg acttagagcc tgccagcaat tatactttct acattgtagc atatatgcca 1260 

atgggagcca gccagatgtc tgaccatgtg acacagaata ctctagagga tgaccccaga 1320 

agaaaatatc atgtgagact cctggcttac aacaacatag acgatggcta tcaggcagat 1380 

cagactgtca gcactccagg atgcgtgtct gttcgtgatc gcatggtccc tcctccacca 1440 

ccaccccacc atctctatgc gaaggctaac acctcatctt ccatcttcct gcactggagg 1500 

aggcctgcat tcaccgctgc acaaatcatt aactacacca tccgctgtaa tcctgttggc 1560 

ctgcagaatg cttctttggt tctgtacctt caaacatcag aaactcacat gttggttcaa 1620 

ggtctagaac caaacaccaa atacgaattt gccgttcgat tacatgtgga tcagctttcc 1680 

agtccttgga gccctgtagt ctaccattct actcttccag aagcaccagc aggcccacca 1740 

gttggagtaa aagtgacatt aatagaggat gacactgccc tggtttcttg gaaaccccct 1800 

gatggcccag aaacagttgt gacccgctat actatcttat atgcatctag gaaggcctgg 1860 

attgcaggag agtggcaggt cttacaccgt gaaggggcaa taaccatggc tttgctagaa 1920 

aacttggtag caggaaatgt gtacattgtc aagatatctg catccaatga ggtgggagaa 1980 

ggaccctttt caaattctgt ggagctggca gtacttccaa aggaaacctc tgaatcaaat 2040 

cagaggccca agcgtttaga ttctgctgat gccaaagttt attcaggata ttaccatctg 2100 

gaccaaaaat caatgactgg cattgctgta ggtgttggca tagccttgac ctgcatcctc 2160 

atctgtgttc tcatcttgat ataccgaagt aaagccagga aatcatctgc ttccaagacg 2220 

gcacagaatg gaactcaaca gttacctcgt accagtgcct ccttagctag tggaaatgag 2280 

gtaggaaaga acctggaagg agctgtagga aatgaagaat ctttaatgcc aatgatcatg 2340 

ccaaacagct tcattgatgc aaagggagga actgacctga taattaatag ctatggtcct 2400 

ataattaaaa acaactctaa gaaaaagtgg ttttttttcc aagactcaaa gaagatacaa 2 460 

gttgagcagc ctcaaagaag atttactcca gcggtctgct tttaccagcc aggcaccact 2520 

gtattaatca gtgatgaaga ctcccctagc tccccaggtc agacaaccag cttctcaaga 2580 

ccctttggtg ttgcagctga tacagaacat tcagcaaata gtgaaggcag. ccatgagact 2640 

ggggattctg ggcggttttc tcatgagtcc aacgatgaga tacatctgtc ctcagttata 2700 

agtaccacac cccccaacct ctga 2724 



<210> SEQ ID NO 20 

<211> LENGTH: 907 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE : 20 



Met Ser Glu Asn Lys Arg lie Glu Val Leu Ser Asn Gly Ser Leu Tyr 
15 10 15 

lie Ser Glu Val Glu Gly Arg Arg Gly Glu Gin Ser Asp Glu Gly Phe 
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20 25 30 

Tyr Gin Cys Leu Ala Met Asn Lys Tyr Gly Ala He Leu Ser Gin Lya 
35 40 45 

. Ala His Leu Ala Leu Ser Thr He Ser Ala Phe Glu Val" Gin Pro He 
50 55 60 

Ser Thr Glu Val His Glu Gly Gly Val Ala Arg Phe Ala Cys Lya He 
65 70 75 80 

Ser Ser His Pro Pro Ala Val He Thr Trp Glu Phe Asn Arg Thr Thr 
85 90 95 

Leu Pro Met Thr Met Asp Arg He Thr Ala Leu Pro Thr Gly Val Leu 
100 105 110 

Gin lie Tyr Asp Val Ser Gin Arg Asp Ser Gly Asn Tyr Arg Cys He 
115 120 125 

Ala Ala Thr Val Ala His Arg Arg Lys Ser Met Glu Ala Ser Leu Thr 
130 135 140 

Val He Pro Ala Lys Glu Ser Lys Ser Phe His Thr Pro Thr He He 
145 150 155 160 

Ala Gly Pro Gin Asn He Thr Thr Ser Leu His Gin Thr Val Val Leu 
165 170 175 

Glu Cys Met Ala Thr Gly Asn Pro Lys Pro He He Ser Trp Ser Arg 
180 185 190 

Leu Asp His Lys Ser He ABp Val Phe Asn Thr Arg Val Leu Gly Asn 
195 200 205 

Gly Asn Leu Met He Ser Asp Val Arg Leu Gin His Ala Gly Val Tyr 
210 215 220 

Val Cys Arg Ala Thr Thr Pro Gly Thr Arg Asn Phe Thr Val Ala Met 
225 230 235 240 

Ala Thr Leu Thr Val Leu Ala Pro Pro Ser Phe Vol Glu Trp Pro Glu 
245 250 255 

Ser Leu Thr Arg Pro Arg Ala Gly Thr Ala Arg Phe Val Cys Gin Ala 
260 265 270 

Glu Gly He Pro Ser Pro Lys Met Ser Trp Leu Lys Asn Gly Arg Lys 
275 280 285 

He His Ser Asn Gly Arg He Lys Met Tyr Asn Ser Lys Leu Val He 
290 295 300 

Asn Gin He He Pro Glu Aep Asp Ala He Tyr Gin Cys Met Ala Glu 
305 310 315 320 

Asn Ser Gin Gly Ser He Leu Ser Arg Ala Arg Leu Thr Val Val Met 
325 330 < 335 

Ser Glu Asp Arg Pro Ser Ala Pro Tyr Asn Val His Ala Glu Thr Met 
■ 340 345 350 

Ser Ser Ser Ala He Leu Leu Ala Trp Glu Arg Pro Leu Tyr Asn Ser 
355 360 365 

Asp Lye Val He Ala Tyr Ser Vol His Tyr Met Lys Ala Glu Gly Leu 
370 375 380 

Asn Asn Glu Glu Tyr Gin Val Val He Gly Asn Asp Thr Thr His Tyr 
385 390 395 400 

He He Asp Asp Leu Glu Pro Ala Ser Asn Tyr Thr Phe Tyr He Val 
405 410 415 

Ala Tyr Met Pro Met Gly Ala Ser Gin Met Ser Asp His Val Thr Gin 
420 425 430 

Asn Thr Leu Glu Asp Asp Pro Arg Arg Lys Tyr His Val Arg Leu Leu 
435 440 445 
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Ala Tyr Aen Asn lie Asp Asp Gly Tyr Gin Ala Asp Gin Thr Val Ser 
450 455 460 

Thr Pro Gly Cys Val Ser Val Arg Asp Arg Met Val Pro Pro Pro Pro 
465 470 475 480 

Pro Pro His His Leu Tyr Ala Lye Ala Asn Thr Ser Ser Ser He Phe 
485 490 495 

Leu His Trp Arg Arg Pro Ala Phe Thr Ala Ala Gin He He Asn Tyr 
500 505 510 

Thr He Arg Cys Asn Pro Vol Gly Leu Gin Aen Ala Ser Leu Val Leu 
515 520 525 

Tyr Leu Gin Thr Ser Glu Thr His Met Leu Val Gin Gly Leu Glu Pro 
530 535 540 

Asn Thr Lye Tyr Glu Phe Ala Val Arg Leu His Val Asp Gin Leu Ser 
545 550 555 560 

Ser Pro Trp Ser Pro Val Val Tyr His Ser Thr Leu Pro Glu Ala Pro 
565 570 575 

Ala Gly Pro Pro Val Gly Val Lys Val Thr Leu He Glu Asp Asp Thr 
580 585 590 

Ala Leu Val Ser Trp Lys Pro Pro Asp Gly Pro Glu Thr Val Val Thr 
595 600 605 

Arg Tyr Thr He Leu Tyr Ala Ser Arg Lys Ala Trp He Ala Gly Glu 
610 615 620 

Trp Gin Val Leu His Arg Glu Gly Ala He Thr Met Ala Leu Leu Glu 
625 630 635 640 

Asn Leu Val Ala Gly Asn Val Tyr He Val Lys He Ser Ala Ser Asn 
645 650 655 

Glu Val Gly Glu Gly Pro Phe Ser Asn Ser Val Glu Leu Ala Val Leu 
660 665 670 

Pro Lys Glu Thr Ser Glu Ser Asn Gin Arg Pro Lys Arg Leu Asp Ser 
675 680 685 

Ala Asp Ala Lys Val Tyr Ser Gly Tyr Tyr His Leu Asp Gin Lys Ser 
690 695 700 

Met Thr Gly He Ala Val Gly Val Gly He Ala Leu Thr Cys He Leu 
705 710 715 720 

He Cys Val Leu He Leu He Tyr Arg Ser Lys Ala Arg Lys Ser Ser 
725 730 735 

Ala Ser Lye Thr Ala Gin Aen Gly Thr Gin Gin Leu Pro Arg Thr Ser 
740 745 750 

Ala Ser Leu Ala Ser Gly Asn Glu Val Gly Lys Asn Leu Glu Gly Ala 
755 760 765 

Val Gly Asn Glu Glu Ser Leu Met Pro Met lie Met Pro Asn Ser Phe 
770 775 780 

He Asp Ala Lys Gly Gly Thr Asp Leu lie He Asn Ser Tyr Gly Pro 
785 790 795 800 

He He Lys Asn Asn Ser Lys Lys Lys Trp Phe Phe Phe Gin Asp Ser 
805 810 815 

Lys Lys He Gin Val Glu Gin Pro Gin Arg Arg Phe Thr Pro Ala Val 
820 825 830 

Cys Phe Tyr Gin Pro Gly Thr Thr Val Leu He Ser Asp Glu Asp Ser 
835 840 845 

Pro Ser Ser Pro Gly Gin Thr Thr Ser Phe Ser Arg Pro Phe Gly Val 
850 855 860 
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Ala Ala Asp Thr Glu His Ser Ala Asn Ser Glu Gly Ser His Glu Thr 
865 870 875 880 



Gly ABp Ser Gly Arg Phe Ser His Glu Ser Asn Asp Glu He His Leu 
885 890 895 



Ser Ser Val He Ser Thr Thr Pro Pro Asn Leu 
900 905 



<210> SEQ ID NO 21 

<2U> LENGTH: 2139 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 21 

atgtcatggt tgaaaaatgg aaggaagata cattcgaatg gtagaattaa aatgtacaac 60 

agtaaattgg taattaacca gattattcct gaagatgatg ctatttatca gtgcatggct 120 

gagaatagcc aaggatctat tttatctaga gccagactga ctgtagtgat gtcagaagac 180 

agacccagtg ctccctataa tgtacatgct gaaaccatgt caagctcagc cattctttta 240 

gcctgggaga ggccacttta taattcagac aaagtcattg cctattctgt acactacatg 300 

aaagcagaag gtttaaataa tgaagagtat caagtagtca tcggaaatga cacaactcat 360 

tatattattg atgacttaga gcctgccagc aattatactt tctacattgt agcatatatg 420 

ccaatgggag ccagccagat gtctgaccat gtgacacaga atactctaga ggatgttccc 480 

ctgagacctc ctgaaattag tttgacaagt cgaagtccca ctgatattct catctcctgg 540 

ctgccaatcc cagccaaata tcggcggggc caagtggtgc tgtatcgctt gtctttccgc 600 

ctaagtactg agaattcaat ccaagttctg gagctcccgg ggaccacgca tgagtacctt 660 

ttggaaggcc tgaaacctga cagtgtctac ctggttcgga ttactgctgc caccagagtg 720 

gggctgggag agtcatcagt atggacttca cataggacgc ccaaagctac aagcgtgaaa 780 

gcccctaagt ctccagagtt gcatttggag cctctgaact gtaccaccat ttctgtgagg 840 

tggcagcaag atgtagagga cacagctgct attcagggct acaagctgta ctacaaggaa 900 

gaagggcagc aggagaatgg gcccattttc ttggatacca aggacctact ctatactctc 960 

agtggcttag accccagaag aaaatatcat gtgagactcc tggcttacaa caacatagac 1020 

gatggctatc aggcagatca gactgtcagc actccaggat gcgtgtctgt tcgtgatcgc 1080 

atggtccctc ctccaccacc accccaccat ctctatgcga aggctaacac ctcatcttcc 1140 

atcttcctgc actggaggag gcctgcattc accgctgcac aaatcattaa ctacaccatc 1200 

cgctgtaatc ctgttggcct gcagaatgct tctttggttc tgtaccttca aacatcagaa 1260 

actcacatgt tggttcaagg tctagaacca aacaccaaat acgaatttgc cgttcgatta 1320 

catgtggatc agctttccag tccttggagc cctgtagtct accattctac tcttccagaa 1380 

gcaccagcag gcccaccagt tggagtaaaa gtgacattaa tagaggatga cactgccctg 1440 

gtttcttgga aaccccctga tggcccagaa acagttgtga cccgctatac tatcttatat 1500 

gcatctagga aggcctggat tgcaggagag tggcaggtct tacaccgtga aggggcaata 1560 

accatggctt tgctagaaaa cttggtagca ggaaatgtgt acattgtcaa gatatctgca 1620 

tccaatgagg tgggagaagg acccttttca aattctgtgg agctggcagt acttccaaag 1680 

gaaacctctg aatcaaatca gaggcccaag cgtttagatt ctgctgatgc caaagtttat 1740 

tcaggatatt accatctgga ccaaaaatca atgactggca ttgctgtagg tgttggcata 1800 

gccttgacct gcatcctcat ctgtgttctc atcttgatat accgaagtaa agccaggaaa 1860 



US 6,465,632 Bl 
83 84 



-continued 



tcatctgctt 


ccaagacggc 


acagaatgga 


actcaacagt tacctcgtac cagtgcctcc 


1920 


ttagctagtg gaaatgaggt 


aggaaagaac ctggaaggag ctgtaggaaa tgaagaatct 


1980 


ttaatgccaa 


tgatcatgcc 


aaacagcttc 


attgatgcaa aggtactgag ctgcgggatt 


2040 


tgctgcataa 


gccgttcttc 


cattcctcct 


ccctgtgtgt gtaaaatgta cttcccccaa 


2100 


aattgtatgt 


tgaatgtatt 


ataccaatac 


tcttattaa 


2139 



<210> SEQ ID NO 22 

<211> LENGTH: 712 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCES 22 

Met Ser Trp Leu Lys Asn Gly Arg Lys He His Ser Asn Gly Arg He 
1 5 10 15 

Lys Met Tyr Asn Ser Lys Leu Val He Asn Gin He He Pro Glu Asp 
20 25 30 

Asp Ala He Tyr Gin Cys Met Ala Glu Asn Ser Gin Gly Ser He Leu 
35 40 45 

Ser Arg Ala Arg Leu Thr Val Val Met Ser Glu Asp Arg Pro Ser Ala 
50 55 60 

Pro Tyr Asn Val His Ala Glu Thr Met Ser Ser Ser Ala He Leu Leu 
65 70 75 80 

Ala Trp Glu Arg Pro Leu Tyr Asn Ser Asp Lys Val He Ala Tyr Ser 
85 90 95 

Val His Tyr Met Lye Ala Glu Gly Leu Asn Asn Glu Glu Tyr Gin Val 
.100 105 110 

Val He Gly Asn Asp Thr Thr His Tyr He He Asp Asp Leu Glu Pro 
115 120 125 

Ala Ser Asn Tyr Thr Phe Tyr He Val Ala Tyr Met Pro Met Gly Ala 
130 135 140 

Ser Gin Met Ser Aep His Val Thr Gin Asn Thr Leu Glu Asp Val Pro 
145 150 155 160 

Leu Arg Pro Pro Glu He Ser Leu Thr Ser Arg Ser Pro Thr Asp He 
165 170 175 

Leu He Ser Trp Leu Pro He Pro Ala Lye Tyr Arg Arg Gly Gin Val 
180 185 190 

Val Leu Tyr Arg Leu Ser Phe Arg Leu Ser Thr Glu Asn Ser He Gin 
195 200 205 

Val Leu Glu Leu Pro Gly Thr Thr His Glu Tyr Leu Leu Glu Gly Leu 
210 215 220 

Lys Pro Asp Ser Val Tyr Leu Val Arg He Thr Ala Ala Thr Arg Val 
225 230 235 240 

Gly Leu Gly Glu Ser Ser Val Trp Thr Ser His Arg Thr Pro Lye Ala 
245 250 255 

Thr Ser Val Lys Ala Pro Lys Ser Pro Glu Leu His Leu Glu Pro Leu 
260 265 270 

Asn Cys Thr Thr He Ser Val Arg Trp Gin Gin Asp Val Glu Asp Thr 
275 280 285 

Ala Ala He Gin Gly Tyr Lys Leu Tyr Tyr Lys Glu Glu Gly Gin Gin 
290 295 300 

Glu Asn Gly Pro He Phe Leu Asp Thr Lys Asp Leu Leu Tyr Thr Leu 
305 310 315 320 

Ser Gly Leu Asp Pro Arg Arg Lye Tyr His Val Arg Leu Leu Ala Tyr 
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325 330 335 

Asn Asn lie Asp A Bp Gly Tyr Gin Ala Asp Gin Thr Val Ser Thr Pro 
340 345 350 

Gly Cys Val Ser Val Arg Aep Arg Met Val Pro Pro Pro Pro Pro Pro 
355 360 365 

His His Leu Tyr Ala Lys Ala Asn Thr Ser Ser Ser lie Phe Leu His 
370 375 380 

Trp Arg Arg Pro Ala Phe Thr Ala Ala Gin lie lie Asn Tyr Thr lie 
385 390 395 400 

Arg Cys Asn Pro Val Gly Leu Gin Asn Ala Ser Leu Val Leu Tyr Leu 
405 410 415 

Gin Thr Ser Glu Thr His Met Leu Val Gin Gly Leu Glu Pro Asn Thr 
420 425 430 

Lys Tyr Glu Phe Ala Val Arg Leu His Val Asp Gin Leu Ser Ser Pro 
435 440 445 

Trp Ser Pro Val Val Tyr His Ser Thr Leu Pro Glu Ala Pro Ala Gly 
450 455 460 

Pro Pro Val Gly Val Lys Val Thr Leu He Glu Asp Asp Thr Ala Leu 
465 470 475 480 

Val Ser Trp Lys Pro Pro Asp Gly Pro Glu Thr Val Val Thr Arg Tyr 
485 490 495 

. Thr He Leu Tyr Ala Ser Arg Lys Ala Trp lie Ala Gly Glu Trp Gin 
500 505 510 

Val Leu His Arg Glu Gly Ala He Thr Met Ala Leu Leu Glu Asn Leu 
515 520 525 

Val Ala Gly Asn Val Tyr He Val Lys He Ser Ala Ser Asn Glu Val 
530 535 540 

Gly Glu Gly Pro Phe Ser Asn Ser Val Glu Leu Ala Val Leu Pro Lys 
545 550 555 560 

Glu Thr Ser Glu Ser Asn Gin Arg Pro Lys Arg Leu Asp Ser Ala Asp 
565 570 575 

Ala Lys Val Tyr Ser Gly Tyr Tyr His Leu Asp Gin Lys Ser Met Thr 
580 585 590 

Gly He Ala Val Gly Val Gly He Ala Leu Thr Cys He Leu He Cys 
595 600 605 

Val Leu He Leu He Tyr Arg Ser Lys Ala Arg Lys Ser Ser Ala Ser 
610 615 620 

Lys Thr Ala Gin Asn Gly Thr Gin Gin Leu Pro Arg Thr Ser Ala Ser 
625 630 635 640 

Leu Ala Ser Gly Asn Glu Val Gly Lys Asn Leu Glu Gly Ala Val Gly 
645 650 655 

Asn Glu Glu Ser Leu Met Pro Met He Met Pro Asn Ser Phe He Asp 
660 665 670 

Ala Lys Val Leu Ser Cys Gly He Cya Cys He Ser Arg Ser Ser He 
675 680 685 

Pro Pro Pro Cys Val Cys Lys Met Tyr Phe Pro Gin Asn Cys Met Leu 
690 695 700 

Asn Val Leu Tyr Gin Tyr Ser Tyr 
705 710 



<210> SEQ ID NO 23 

<211> LENGTH: 1875 

<212> TYPE : DMA 

<213> ORGANISM : homo sapiens 
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<400> SEQUENCE: 23 

atggaaggaa gatacattcg aatggtagaa ttaaaatgta caacaggttt aaataatgaa 60 

gagtatcaag tagtcatcgg aaatgacaca actcattata ttattgatga cttagagcct .120 

gccagcaatt atactttcta cattgtagca tatatgccaa tgggagccag ccagatgtct 180 

gaccatgtga cacagaatac tctagaggat gttcccctga gacctcctga aattagtttg 240 

acaagtcgaa gtcccactga tattctcatc tcctggctgc caatcccagc caaatatcgg 300 

cggggccaag tggtgctgta tcgcttgtct ttccgcctaa gtactgagaa ttcaatccaa 360 

gttctggagc tcccggggac cacgcatgag taccttttgg aaggcctgaa acctgacagt 420 

gtctacctgg ttcggattac tgctgccacc agagtggggc tgggagagtc atcagtatgg 480 

acttcacata ggacgcccaa agctacaagc gtgaaagccc ctaagtctcc agagttgcat 540 

ttggagcctc tgaactgtac caccatttct gtgaggtggc agcaagatgt agaggacaca 600 

gctgctattc agggctacaa gctgtactac aaggaagaag ggcagcagga gaatgggccc 660 

attttcttgg ataccaagga cctactctat actctcagtg gcttagaccc cagaagaaaa 720 

tatcatgtga gactcctggc ttacaacaac atagacgatg gctatcaggc agatcagact 780 

gtcagcactc caggatgcgt gtctgttcgt gatcgcatgg tccctcctcc accaccaccc 840 

caccatctct atgcgaaggc taacacctca tcttccatct tcctgcactg gaggaggcct 900 

gcattcaccg ctgcacaaat cattaactac accatccgct gtaatcctgt tggcctgcag 960 

aatgcttctt tggttctgta ccttcaaaca tcagaaactc acatgttggt tcaaggtcta 1020 

gaaccaaaca ccaaatacga atttgccgtt cgattacatg tggatcagct ttccagtcct 1080 

tggagccctg tagtctacca ttctactctt ccagaagcac cagcaggccc accagttgga 1140 

gtaaaagtga cattaataga ggatgacact gccctggttt cttggaaacc ccctgatggc 1200 

ccagaaacag ttgtgacccg ctatactatc ttatatgcat ctaggaaggc ctggattgca 1260 

ggagagtggc aggtcttaca ccgtgaaggg gcaataacca tggctttgct agaaaacttg 1320 

gtagcaggaa atgtgtacat tgtcaagata tctgcatcca atgaggtggg agaaggaccc 1380 

ttttcaaatt ctgtggagct ggcagtactt ccaaaggaaa cctctgaatc aaatcagagg 1440 

cccaagcgtt tagattctgc tgatgccaaa gtttattcag gatattacca tctggaccaa 1500 

aaatcaatga ctggcattgc tgtaggtgtt ggcatagcct tgacctgcat cctcatctgt 1560 

gttctcatct tgatataccg aagtaaagcc aggaaatcat ctgcttccaa gacggcacag 1620 

aatggaactc aacagttacc tcgtaccagt gcctccttag ctagtggaaa tgaggtagga 1680 

aagaacctgg aaggagctgt aggaaatgaa gaatctttaa tgccaatgat catgccaaac 1740 

agcttcattg atgcaaaggt actgagctgc gggatttgct gcataagccg ttcttccatt .1800 

cctcctccct gtgtgtgtaa aatgtacttc ccccaaaatt gtatgttgaa tgtattatac 1860 

caatactctt attaa 1875 

<210> SEQ ID NO 24 

<211> LENGTH: 624 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 24 

Met Glu Gly Arg Tyr lie Arg Met Val Glu Leu LyB Cys Thr Thr Gly 



Leu Asn Asn Glu Glu Tyr Gin Val Val He Gly Asn Asp Thr Thr His 
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20 25 30 

Tyr lie lie Asp Asp Leu Glu Pro Ala Ser Asn Tyr Thr Phe Tyr He 
35 40 45 

Val Ala Tyr Met Pro Met Gly Ala Ser Gin Met Ser Asp His Val Thr 
50 55 60 

Gin Asn Thr Leu Glu Asp Val Pro Leu Arg Pro Pro Glu He Ser Leu 
65 70 75 80 

Thr Ser Arg Ser Pro Thr Asp lie Leu He Ser Trp Leu Pro He Pro 
85 90 95 

Ala Lys Tyr Arg Arg Gly Gin Val Val Leu Tyr Arg Leu Ser Phe Arg 
100 105 110 

Leu Ser Thr Glu Asn Ser He Gin Val Leu Glu Leu Pro Gly Thr Thr 
115 120 125 

His Glu Tyr Leu Leu Glu Gly Leu Lys Pro Asp Ser Val Tyr Leu Val 
130 135 140 

Arg lie Thr Ala Ala Thr Arg Val Gly Leu Gly Glu Ser Ser Val Trp 
145 150 155 160 

Thr Ser His Arg Thr Pro Lys Ala Thr Ser Val Lys Ala Pro Lys Ser 
165 170 175 

Pro Glu Leu His Leu Glu Pro Leu Asn Cys Thr Thr He Ser Val Arg 
180 185 190 

Trp Gin Gin ABp Val Glu Asp Thr Ala Ala He Gin Gly Tyr Lys Leu 
195 200 205 

Tyr Tyr Lys Glu Glu Gly Gin Gin Glu Asn Gly Pro He Phe Leu Asp 
210 .215 220 

Thr Lys Asp Leu Leu Tyr Thr Leu Ser Gly Leu Asp Pro Arg Arg Lys 
225 230 235 240 

Tyr His Val Arg Leu Leu Ala Tyr Asn Asn He Asp Asp Gly Tyr Gin 
245 250 255 

Ala Asp Gin Thr Val Ser Thr Pro Gly Cys Val Ser Val Arg Asp Arg 
260 265 270 

Met Val Pro Pro Pro Pro Pro Pro His His Leu Tyr Ala Lys Ala Asn 
275 280 285 

Thr Ser Ser Ser He Phe Leu His Trp Arg Arg Pro Ala Phe Thr Ala 
290 295 300 

Ala Gin He He Asn Tyr Thr He Arg Cys Asn Pro Val Gly Leu Gin 
305 310 315 320 

Asn Ala Ser Leu Val Leu Tyr Leu Gin Thr Ser Glu Thr His Met Leu 
325 330 335 

Val Gin Gly Leu Glu Pro Asn Thr Lys Tyr Glu Phe Ala Val Arg Leu 
340 . ■ 345 350 

His Val Asp Gin Leu Ser Ser Pro Trp Ser Pro Val Val Tyr His Ser 
355 360 365 

Thr Leu Pro Glu Ala Pro Ala Gly Pro Pro Val Gly Val Lys Val Thr 
370 375 380 

Leu He Glu Asp Asp Thr Ala Leu Val Ser Trp Lys Pro Pro Asp Gly 
385 390 395 400 

Pro Glu Thr Val Val Thr Arg Tyr Thr He Leu Tyr Ala Ser Arg Lys 
405 410 415 

Ala Trp He Ala Gly Glu Trp Gin Val Leu His Arg Glu Gly Ala He 
420 425 430 

Thr Met Ala Leu Leu Glu Aen Leu Val Ala Gly Asn Val Tyr He Val 
435 440 445 
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Lys He Ser Ala Ser Asn Glu Val Gly Glu Gly Pro Phe Ser Asn Ser 
450 455 460 

Val Glu Leu Ala Val Leu Pro Lys Glu Thr Ser Glu Ser Aen Gin Arg 
465 470 475, 480 

Pro Lys Arg Leu Asp Ser Ala Asp Ala Lys Val Tyr Ser Gly Tyr Tyr 
485 490 495 

His Leu Asp Gin Lys Ser Met Thr Gly He Ala Val Gly Val Gly He 
500 505 510 

Ala Leu Thr Cys He Leu He Cys Val Leu He Leu He Tyr Arg Ser 
515 520 .525 

Lys Ala Arg Lys Ser Ser Ala Ser Lys Thr Ala Gin Asn Gly Thr Gin 
530 535 540 

Gin Leu Pro Arg Thr Ser Ala Ser Leu Ala Ser Gly Asn Glu Val Gly 
545 550 555 560 

Lys Asn Leu Glu Gly Ala Val Gly Asn Glu Glu Ser Leu Met Pro Met 
565 570 575 

He Met Pro Asn Ser Phe He Asp Ala Lya Val Leu Ser Cys Gly He 
580 585 590 

Cys Cys He Ser Arg Ser Ser He Pro Pro Pro Cys Val Cys Lys Met 
595 600 605 

Tyr Phe Pro Gin Asn Cys Met Leu Asn Val Leu Tyr Gin Tyr Ser Tyr 
610 615 620 



<210> SEQ ID NO, 25 
•<211> LENGTH : 1644 
<212> TYPE: DNA 
<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 25 

atgtcatggt tgaaaaatgg aaggaagata cattcgaatg gtagaattaa aatgtacaac 60 

agtaaattgg taattaacca gattattcct gaagatgatg ctatttatca gtgcatggct 120 

gagaatagcc aaggatctat tttatctaga gccagactga ctgtagtgat gtcagaagac 180 

agacccagtg ctccctataa tgtacatgct gaaaccatgt caagctcagc cattctttta 240 

gcctgggaga ggccacttta taattcagac aaagtcattg cctattctgt acactacatg 300 

aaagcagaag gtttaaataa tgaagagtat caagtagtca tcggaaatga cacaactcat 360 

tatattattg atgacttaga gcctgccagc aattatactt tctacattgt agcatatatg 420 

ccaatgggag ccagccagat gtctgaccat gtgacacaga atactctaga ggatgacccc 480 

a,g. aa 9 aaaat atcatgtgag actcctggct tacaacaaca tagacgatgg ctatcaggca 540 

gatcagactg tcagcactcc aggatgcgtg tctgttcgtg atcgcatggt ccctcctcca 600 

ccaccacccc accatctcta tgcgaaggct aacacctcat cttccatctt cctgcactgg '> 660 

aggaggcctg cattcaccgc tgcacaaatc attaactaca ccatccgctg taatcctgtt 720 

ggcctgcaga atgcttcttt ggttctgtac cttcaaacat cagaaactca catgttggtt 780 

caaggtctag aaccaaacac caaatacgaa tttgccgttc gattacatgt ggatcagctt 840 

tccagtcctt ggagccctgt agtctaccat tctactcttc cagaagcacc agcaggccca 900 

ccagttggag taaaagtgac attaatagag gatgacactg ccctggtttc ttggaaaccc 960 

cctgatggcc cagaaacagt tgtgacccgc tatactatct tatatgcatc taggaaggcc 1020 

tggattgcag gagagtggca ggtcttacac cgtgaagggg caataaccat ggctttgcta 1080 

gaaaacttgg tagcaggaaa tgtgtacatt gtcaagatat ctgcatccaa tgaggtggga 1140 
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gaaggaccct tttcaaattc tgtggagctg gcagtacttc caaaggaaac ctctgaatca 1200 

aatcagaggc ccaagcgttt agattctgct gatgccaaag tttattcagg atattaccat 1260 

ctggaccaaa aatcaatgac tggcattgct gtaggtgttg gcatagcctt gacctgcatc 1320 

ctcatctgtg ttctcatctt gatataccga agtaaagcca ggaaatcatc tgcttccaag 1380 

acggcacaga atggaactca acagttacct cgtaccagtg cctccttagc tagtggaaat 1440 

gaggtaggaa agaacctgga aggagctgta ggaaatgaag aatctttaat gccaatgatc 1500 

atgccaaaca gcttcattga tgcaaaggta ctgagctgcg ggatttgctg cataagccgt 1560 

tcttccattc ctcctccctg tgtgtgtaaa atgtacttcc cccaaaattg tatgttgaat 1620 

gtattatacc aatactctta ttaa 1644 

<210> SEQ ID NO 26 

<211> LENGTH: 547 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 26 

Met Ser Trp Leu Lys Asn Gly Arg Lys lie His Ser Asn Gly Arg He 
15 10 15 

Lys Met Tyr Asn Ser Lys Leu Val He Asn Gin He He Pro Glu Asp 
20 25 30 

Asp Ala He Tyr Gin Cys Met Ala Glu Asn Ser Gin Gly Ser He Leu 
35 40 45 

Ser Arg Ala Arg Leu Thr Val Val Met Ser Glu Asp Arg Pro Ser Ala 
50 55 60 

Pro Tyr Asn Val His Ala Glu Thr Met Ser Ser Ser Ala He Leu Leu 
65 70 75 80 

Ala Trp Glu Arg Pro Leu Tyr Asn Ser Asp Lys Val He Ala Tyr Ser 
85 90 95 

Val His Tyr Met Lys Ala Glu Gly Leu Asn Asn Glu Glu Tyr Gin Val 
100 105 110 

Val He Gly Asn Asp Thr Thr His Tyr He He Asp Asp Leu Glu Pro 
115 120 125 

Ala Ser Asn Tyr Thr Phe Tyr He Val Ala Tyr Met Pro Met Gly Ala 
130 135 140 

Ser Gin Met Ser Asp His Val Thr Gin Asn Thr Leu Glu Asp Asp Pro 
145 150 155 160 

Arg Arg Lys Tyr His Val Arg Leu Leu Ala Tyr Asn Asn He Asp Asp 
165 170 . 175 

Gly Tyr Gin Ala Asp Gin Thr Val Ser Thr Pro Gly Cys Val Ser Val 
180 185 190 

Arg Asp Arg Met Val Pro Pro Pro Pro Pro Pro Hie His Leu Tyr Ala 
195 200 205 

Lys Ala Asn Thr Ser Ser Ser He Phe Leu His Trp Arg Arg Pro Ala 
210 215 220 

Phe Thr Ala Ala Gin He He Asn Tyr Thr He Arg Cys Asn Pro Val 
225 230 235 240 

Gly Leu Gin Asn Ala Ser Leu Val Leu Tyr Leu Gin Thr Ser Glu Thr 
245 250 255 

His Met Leu Val Gin Gly Leu Glu Pro Asn Thr Lye Tyr Glu Phe Ala 
260 265 270 

Val Arg Leu His Val Asp Gin Leu Ser Ser Pro Trp Ser Pro Val Val 
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275 



280 . 



285 



Tyr His Ser Thr Leu Pro Glu Ala Pro Ala Gly Pro Pro Val Gly Val 
290 295. 300 

Lys Val Thr Leu lie Glu Asp Asp Thr Ala Leu Val Ser Trp Lye Pro 
305 310 315 320 

Pro Asp Gly Pro Glu Thr Val Val Thr Arg Tyr Thr He Leu Tyr Ala 
325 330 335 

Ser Arg Lys Ala Trp He Ala Gly Glu Trp Gin Val Leu His Arg Glu 
340 345 350 

Gly Ala He Thr Met Ala Leu Leu Glu Asn Leu Val Ala Gly Asn Val 
355 360 365 

Tyr He Val Lys He Ser Ala Ser Asn Glu Val Gly Glu Gly Pro Phe 
370 375 380 

Ser Aen Ser Val Glu Leu Ala Val Leu Pro Lys Glu Thr Ser Glu Ser 
385 390 395 400 

Asn Gin Arg Pro Lys Arg Leu Asp Ser Ala Asp Ala Lys Val Tyr Ser 
405 410 415 

Gly Tyr Tyr His Leu Asp Gin Lys Ser Met Thr Gly He Ala Val Gly 
420 425 430 

Val Gly He Ala Leu Thr Cys He Leu He Cys Val Leu He Leu He 
. 435 440 445 

Tyr Arg Ser Lys Ala Arg Lys Ser Ser Ala Ser Lys Thr Ala Gin Asn 
450 455 460 

Gly Thr Gin Gin Leu Pro Arg Thr Ser Ala Ser Leu Ala Ser Gly Asn 
465 . 470 . 475 480 

Glu Val Gly Lys Asn Leu Glu Gly Ala Val Gly Asn Glu Glu Ser Leu 
485 490 495 

Met Pro Met He Met Pro Asn Ser Phe He Asp Ala Lys Val Leu Ser 
500 505 510 

Cys Gly He Cys Cys He Ser Arg Ser Ser He Pro Pro Pro Cys Val 
515 520 525 

Cys Lys Met Tyr Phe Pro Gin Asn Cys Met Leu Asn Val Leu Tyr Gin 
530 535 540 

Tyr Ser Tyr 
545 



<210> SEQ ID NO 27 

<211> LENGTH: 2382 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 27 

atgtcatggt tgaaaaatgg aaggaagata cattcgaatg gtagaattaa aatgtacaac 60 

agtaaattgg taattaacca gattattcct gaagatgatg ctatttatca gtgcatggct 120 

gagaatagcc aaggatctat tttatctaga gccagactga ctgtagtgat gtcagaagac 180 

agacccagtg ctccctataa tgtacatgct gaaaccatgt caagctcagc cattctttta 240 

gcctgggaga ggccacttta taattcagac aaagtcattg cctattctgt acactacatg 300 

aaagcagaag gtttaaataa tgaagagtat caagtagtca tcggaaatga cacaactcat 360 

tatattattg atgacttaga gcctgccagc aattatactt tctacattgt agcatatatg 420 

ccaatgggag ccagccagat gtctgaccat gtgacacaga atactctaga ggatgttccc 480 

ctgagacctc ctgaaattag tttgacaagt cgaagtccca ctgatattct catctcctgg 540 
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ctgccaatcc cagccaaata tcggcggggc caagtggtgc tgtatcgctt gtctttccgc 600 

ctaagtactg agaattcaat ccaagttctg gagctcccgg ggaccacgca tgagtacctt 660 

ttggaaggcc tgaaacctga cagtgtctac ctggttcgga ttactgctgc caccagagtg 720 

gggctgggag agtcatcagt atggacttca cataggacgc ccaaagctac aagcgtgaaa 780 

gcccctaagt ctccagagtt gcatttggag cctctgaact gtaccaccat ttctgtgagg 840 

tggcagcaag atgtagagga cacagctgct attcagggct acaagctgta ctacaaggaa 900 

gaagggcagc aggagaatgg gcccattttc ttggatacca aggacctact ctatactctc 960 

agtggcttag accccagaag aaaatatcat gtgagactcc tggcttacaa caacatagac 1020 

gatggctatc aggcagatca gactgtcagc actccaggat gcgtgtctgt tcgtgatcgc 1080 

atggtccctc ctccaccacc accccaccat ctctatgcga aggctaacac ctcatcttcc 1140 

atcttcctgc actggaggag gcctgcattc accgctgcac aaatcattaa ctacaccatc 1200 

cgctgtaatc ctgttggcct gcagaatgct tctttggttc tgtaccttca aacatcagaa 1260 

actcacatgt tggttcaagg tctagaacca aacaccaaat acgaatttgc cgttcgatta 1320 

catgtggatc agctttccag tccttggagc cctgtagtct accattctac tcttccagaa 1380 

gcaccagcag gcccaccagt tggagtaaaa gtgacattaa tagaggatga cactgccctg 1440 

gtttcttgga aaccccctga tggcccagaa acagttgtga cccgctatac tatcttatat 1500 

gcatctagga aggcctggat tgcaggagag tggcaggtct tacaccgtga aggggcaata 1560 

accatggctt tgctagaaaa cttggtagca ggaaatgtgt acattgtcaa gatatctgca 1620 

tccaatgagg tgggagaagg acccttttca aattctgtgg agctggcagt acttccaaag 1680 

gaaacctctg aatcaaatca gaggcccaag cgtttagatt ctgctgatgc caaagtttat 1740 

tcaggatatt accatctgga ccaaaaatca atgactggca ttgctgtagg tgttggcata 1800 

gccttgacct gcatcctcat ctgtgttctc atcttgatat accgaagtaa agccaggaaa 1860 

tcatctgctt ccaagacggc acagaatgga actcaacagt tacctcgtac cagtgcctcc 1920 

ttagctagtg gaaatgaggt aggaaagaac ctggaaggag ctgtaggaaa tgaagaatct 1980 

ttaatgccaa tgatcatgcc aaacagcttc attgatgcaa agggaggaac tgacctgata 2040 

attaatagct atggtcctat aattaaaaac aactctaaga aaaagtggtt ttttttccaa 2100 

gactcaaaga agatacaagt tgagcagcct caaagaagat ttactccagc ggtctgcttt 2160 

taccagccag gcaccactgt attaatcagt gatgaagact cccctagctc cccaggtcag 2220 

acaaccagct tctcaagacc ctttggtgtt gcagctgata cagaacattc agcaaatagt 2280 

gaaggcagcc atgagactgg ggattctggg cggttttctc atgagtccaa cgatgagata 2340 

catctgtcct cagttataag taccacaccc cccaacctct ga , 2382 



<210> SEQ ID NO 28 

<211> LENGTH 1 793 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 28 

Met Ser Trp Leu Lye Asn Gly Arg Lys He His Ser Asn Gly Arg He 
15 10 15 

Lye Met Tyr Asn Ser Lys Leu Val He Asn Gin He He Pro Glu Asp 
20 25 30 



Aep Ala He Tyr Gin Cys Met Ala Glu Asn Ser Gin Gly Ser He Leu 
35 40 45 
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Ser Arg Ala Arg Leu Thr Val Val Met Ser Glu Asp Arg Pro Ser Ala 
50 55 60 

Pro Tyr Asn Val His Ala Glu Thr Met Ser Ser Ser Ala He Leu Leu 

65 .70 75 80 

Ala Trp Glu Arg Pro Leu Tyr Asn Ser Asp Lys Val He Ala Tyr Ser 

85 90 95 

Val His Tyr Met Lys Ala Glu Gly Leu Asn Asn Glu Glu Tyr Gin Val 
100 105 110 

Val He Gly Asn Asp Thr Thr His Tyr He He Asp Asp Leu Glu Pro 



Ser Gin Met Ser Asp His Val Thr Gin Asn Thr Leu Glu Asp Val Pro 
145 150 155 160 

Leu Arg Pro Pro Glu He Ser Leu Thr Ser Arg Ser Pro Thr Asp He 
165 170 175 

Leu He Ser Trp Leu Pro He Pro Ala Lys Tyr Arg Arg Gly Gin Val 
180 185 190 

Val Leu Tyr Arg Leu Ser Phe Arg Leu Ser Thr Glu Asn Ser He Gin 
195 200 205 

Val Leu Glu Leu Pro Gly Thr Thr His Glu Tyr Leu Leu Glu Gly Leu 
210 215 220 

Lys Pro Asp Ser Val Tyr Leu Val Arg He Thr Ala Ala Thr Arg Val 
225 230 235 240 

Gly Leu Gly Glu Ser Ser Val Trp Thr Ser His Arg Thr Pro Lys Ala 
245 250 255 ^ 

Thr Ser Val Lys Ala Pro Lys Ser Pro Glu Leu His Leu Glu Pro Leu 
260 265 270 

Asn Cys Thr Thr He Ser Val Arg Trp Gin Gin Asp Val Glu Asp Thr 
275 280 285 

Ala Ala He Gin Gly Tyr Lys Leu Tyr Tyr Lys Glu Glu Gly Gin Gin 
290 295 300 

Glu Asn Gly Pro He Phe Leu Asp Thr Lys Asp Leu Leu Tyr Thr Leu 
305 ' 310 315 320 

Ser Gly Leu Asp Pro Arg Arg Lys Tyr His Val Arg Leu Leu Ala Tyr 
325 330 335 

Asn Asn He Asp Asp Gly Tyr Gin Ala Asp Gin Thr Val Ser Thr Pro 
340 345 350 

Gly Cys Val Ser Val Arg Asp Arg Met Val Pro Pro Pro Pro Pro Pro 
355 360 365 

His His Leu Tyr Ala Lys Ala Asn Thr Ser Ser Ser He Phe Leu His 
370 375 380 

Trp Arg Arg Pro Ala Phe Thr Ala Ala Gin He He Asn Tyr Thr He 
385 390 395 400 

Arg Cys Asn Pro Val Gly Leu Gin Asn Ala Ser Leu Val Leu Tyr Leu 
405 410 415 

Gin Thr Ser Glu Thr His Met Leu Val Gin Gly Leu Glu Pro Asn Thr 
420 425 430 

Lys Tyr Glu Phe Ala Val Arg Leu His Val Asp Gin Leu Ser Ser Pro 
435 440 445 

Trp Ser Pro Val Val Tyr His Ser Thr Leu Pro Glu Ala Pro Ala Gly 
450 455 460 

Pro Pro Val Gly Val Lys Val Thr Leu He Glu Asp Asp Thr Ala Leu 



115 



120 



125 



Ala Ser Asn Tyr Thr Phe Tyr He Val Ala Tyr Met Pro Met Gly Ala 
130 135 140 
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465 



470 



475 



460 



Val Ser Trp Lys Pro Pro Asp Gly Pro Glu Thr Val Val Thr Arg Tyr 
485 490 495 



Thr lie Leu Tyr Ala Ser Arg Lys Ala Trp lie Ala Gly Glu Trp Gin 
500 505 510 



Val Leu His Arg Glu Gly Ala He Thr Met Ala Leu Leu Glu Asn Leu 
515 520 525 

Val Ala Gly Asn Val Tyr He Val Lys He Ser Ala Ser Asn Glu Val 
530 535 540 

Gly Glu Gly Pro Phe Ser Aon Ser Val Glu Leu Ala Val Leu Pro Lys 
545 550 555 560 

Glu Thr Ser Glu Ser Asn Gin Arg Pro Lys Arg Leu Asp Ser Ala Asp 
565 570 575 

Ala Lys Val Tyr Ser Gly Tyr Tyr His Leu Asp Gin Lys Ser Met Thr 
580 585 590 

Gly He Ala Val Gly Val Gly He Ala Leu Thr Cys He Leu He Cys 
595 600 605 

Val Leu He Leu He Tyr Arg Ser Lys Ala Arg Lys Ser Ser Ala Ser 
610 615 620 

Lys Thr Ala Gin Asn Gly Thr Gin Gin Leu Pro Arg Thr Ser Ala Ser 
625 630 635 640 

Leu Ala Ser Gly Asn Glu Val Gly Lys Asn Leu Glu Gly Ala Val Gly 
645 650 655 

Asn Glu Glu Ser Leu Met Pro Met He Met Pro Asn Ser Phe He Asp 
- 660 665 ■ 670 

Ala Lys Gly Gly Thr Asp Leu He He Asn Ser Tyr Gly Pro He lie 
675 680 685 

Lys Asn Asn Ser Lys Lys Lys Trp Phe Phe Phe Gin Asp Ser Lys Lys 
690 695 700 

lie Gin Val Glu Gin Pro Gin Arg Arg Phe Thr Pro Ala Val CyB Phe 
705 710 715 720 

Tyr Gin Pro Gly Thr Thr Val Leu He Ser Asp Glu Asp Ser Pro Ser 
725 730 735 

Ser Pro Gly Gin Thr Thr Ser Phe Ser Arg Pro Phe Gly Val Ala Ala 



Ser Gly Arg Phe Ser His Glu Ser Asn Asp Glu He His Leu Ser Ser 
770 775 780 

Val He Ser Thr Thr Pro Pro Asn Leu 
785 790 



<210> SEQ ID NO 29 

<211> LENGTH: 1887 

<212> TYPE t DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 29 

atgtcatggt tgaaaaatgg aaggaagata cattcgaatg gtagaattaa aatgtacaac 60 

agtaaattgg taattaacca gattattcct gaagatgatg ctatttatca gtgcatggct 120 

gagaatagcc aaggatctat tttatctaga gccagactga ctgtagtgat gtcagaagac 180 

agacccagtg ctccctataa tgtacatgct gaaaccatgt caagctcagc cattctttta 240 

gcctgggaga ggccacttta taattcagac aaagtcattg cctattctgt acactacatg 300 



740 



745 



750 



Asp Thr Glu His Ser Ala Asn Ser Glu Gly Ser His Glu Thr Gly Asp 
7S5 760 765 
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aaagcagaag gtttaaataa tgaagagtat caagtagtca tcggaaatga cacaactcat 360 

tatattattg atgacttaga gcctgccagc aattatactt tctacattgt agcatatatg 420 

ccaatgggag ccagccagat gtctgaccat gtgacacaga atactctaga ggatgacccc - 480 

agaagaaaat atcatgtgag actcctggct tacaacaaca tagacgatgg ctatcaggca 540 

gatcagactg tcagcactcc aggatgcgtg tctgttcgtg atcgcatggt ccctcctcca 600 

ccaccacccc accatctcta tgcgaaggct aacacctcat cttccatctt cctgcactgg 660 

aggaggcctg cattcaccgc tgcacaaatc attaactaca ccatccgctg taatcctgtt 720 

ggcctgcaga atgcttcttt ggttctgtac cttcaaacat cagaaactca catgttggtt 780 

caaggtctag aaccaaacac caaatacgaa tttgccgttc gattacatgt ggatcagctt 840 

tccagtcctt ggagccctgt agtctaccat tctactcttc cagaagcacc agcaggccca 900 

ccagttggag taaaagtgac attaatagag gatgacactg ccctggtttc ttggaaaccc 960 

cctgatggcc cagaaacagt tgtgacccgc tatactatct tatatgcatc taggaaggcc 1020 

tggattgcag gagagtggca ggtcttacac cgtgaagggg caataaccat ggctttgcta 1080 

gaaaacttgg tagcaggaaa tgtgtacatt gtcaagatat ctgcatccaa tgaggtggga 1140 

gaaggaccct tttcaaattc tgtggagctg gcagtacttc caaaggaaac ctctgaatca 1200 

aatcagaggc ccaagcgttt agattctgct gatgccaaag tttattcagg atattaccat 1260 

ctggaccaaa aatcaatgac tggcattgct gtaggtgttg gcatagcctt gacctgcatc 1320 

ctcatctgtg ttctcatctt gatataccga agtaaagcca ggaaatcatc tgcttccaag 1380 

acggcacaga atggaactca acagttacct cgtaccagtg cctccttagc tagtggaaat 1440 

gaggtaggaa agaacctgga aggagctgta ggaaatgaag aatctttaat gccaatgatc 1500 

atgccaaaca gcttcattga tgcaaaggga ggaactgacc tgataattaa tagctatggt 1560 

cctataatta aaaacaactc taagaaaaag tggttttttt tccaagactc aaagaagata 1620 

caagttgagc agcctcaaag aagatttact ccagcggtct gcttttacca gccaggcacc 1680 

actgtattaa tcagtgatga agactcccct agctccccag gtcagacaac cagcttctca 1740 

agaccctttg gtgttgcagc tgatacagaa cattcagcaa atagtgaagg cagccatgag 1800 

actggggatt ctgggcggtt ttctcatgag tccaacgatg agatacatct gtcctcagtt 1B60 

ataagtacca caccccccaa cctctga 1887 

<210> SEQ ID NO 30 

<211> LENGTH: 628 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 30 

Met Ser Trp Leu Lys Aen Gly Arg Lys He His Ser Asn Gly Arg He 
1 5 10 15 

Lys Met Tyr Aan Ser Lys Leu Val He Asn Gin He He Pro Glu Asp 
20 25 30 

Asp Ala He Tyr Gin Cys Met Ala Glu Asn Ser Gin Gly Ser He Leu 
35 40 45 

Ser Arg Ala Arg Leu Thr Val Val Met Ser Glu Asp Arg Pro Ser Ala 
50 55 60 

Pro Tyr Asn Val His Ala Glu Thr Met Ser Ser Ser Ala He Leu Leu 



Ala Trp Glu Arg Pro Leu Tyr Asn Ser Asp Lys Val He Ala Tyr Ser 
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Val His Tyr Met Lys Ala Glu Gly Leu Asn Asn Glu Glu Tyr Gin Val 
,100 105 110 

Val He Gly Asn Asp Thr Thr His Tyr He He Asp Asp Leu Glu Pro 
115 120 125 

Ala Ser Asn Tyr Thr Phe Tyr He Val Ala Tyr Met Pro Met Gly Ala 
130 135 140 

Ser Gin Met Ser Asp His Val Thr Gin Asn Thr Leu Glu Asp Asp Pro 
145 150 155 160 

Arg Arg Lys Tyr His Val Arg Leu Leu Ala Tyr Asn Asn He Asp Asp 
165 170 175 

Gly Tyr Gin Ala Asp Gin Thr Val Ser Thr Pro Gly Cys Val Ser Val 
180 185 190 

Arg Aep Arg Met Val Pro Pro Pro Pro Pro Pro His His Leu Tyr Ala 
195 200 205 

Lye Ala Asn Thr Ser Ser Ser He Phe Leu Hie Trp Arg Arg Pro Ala 
210 215 220 

Phe Thr Ala Ala Gin He He Asn Tyr Thr He Arg Cys Asn Pro Val 
225 230 235 240 

Gly Leu Gin Asn Ala Ser Leu Val Leu Tyr Leu Gin Thr Ser Glu Thr 
245 250 255 

His Met Leu Val Gin Gly Leu Glu Pro Asn Thr Lys Tyr Glu Phe Ala 
260 265 270 

Val Arg Leu His Val Asp Gin Leu Ser Ser Pro Trp Ser Pro Val Val 
275 280 285 

Tyr His Ser Thr Leu Pro Glu Ala Pro Ala Gly Pro Pro Val Gly Val 
290 295 300 

Lys Val Thr Leu He Glu Asp Asp Thr Ala Leu Val Ser Trp Lys Pro 
305 310 315 320 

Pro Asp Gly Pro Glu Thr Val Val Thr Arg Tyr Thr He Leu Tyr Ala 
325 330 335 

Ser Arg Lys Ala Trp He Ala Gly Glu Trp Gin Val Leu His Arg Glu 
340 345 350 

Gly Ala He Thr Met Ala Leu Leu Glu Asn Leu Val Ala Gly Asn Val 
355 360 365 

Tyr He Val Lys He Ser Ala Ser Asn Glu Val Gly Glu Gly Pro Phe 
370 375 380 

Ser Asn Ser Val Glu Leu Ala Val Leu Pro Lys Glu Thr Ser Glu Ser 
385 390 395 400 

Asn Gin Arg Pro Lys Arg Leu Asp Ser Ala Asp Ala Lys Val Tyr Ser 
405 410 415 

Gly Tyr Tyr His Leu Asp Gin Lys Ser Met Thr Gly He Ala Val Gly > - 
420 425 430 

Val Gly He Ala Leu Thr Cys He Leu He Cys Val Leu He Leu He 
435 440 445 

Tyr Arg Ser Lys Ala Arg Lys Ser Ser Ala Ser Lys Thr Ala Gin Aan 
450 455 460 

Gly Thr Gin Gin Leu Pro Arg Thr Ser Ala Ser Leu Ala Ser Gly Asn 
465 470 475 480 

Glu Val Gly Lys Asn Leu Glu Gly Ala Val Gly Asn Glu Glu Ser Leu 
485 490 495 

Met Pro Met He Met Pro Asn Ser Phe He Asp Ala Lys Gly Gly Thr 
500 505 510 
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Asp Leu He He Asn Ser Tyr Gly Pro lie He Lys Asn Asn Ser Lys 
515 520 525 



Lye Lys Trp Phe Phe Phe Gin Asp Ser Lys Lys He Gin Val Glu Gin 
530 . . 535 540 



Pro Gin Arg Arg Phe Thr Pro Ala Val Cys Phe Tyr Gin Pro Gly Thr 
545 550 555 560 



Thr Val Leu He Ser Asp Glu Asp Ser Pro Ser Ser Pro Gly Gin Thr 
565 570 575 



Thr Ser Phe Ser Arg Pro Phe Gly Val Ala Ala Asp Thr Glu His Ser 
580 585 590 



Ala Asn Ser Glu Gly Ser His Glu Thr Gly Asp Ser Gly Arg Phe Ser 
595 600 605 



His Glu Ser Asn Asp Glu He His Leu Ser Ser Val He Ser Thr Thr 
610 615 620 



Pro Pro Asn Leu 
625 

<210> SEQ ID NO 31 

<211> LENGTH: 3874 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 31 

tgcttctcgc gagcggccgt ccgagcacca gcctcgccgc cgcagagacg ctcgccacgc 60 
cggtgccgga gccggagcgg ggagccaggc tgcgtgcgac cagccgcaga gcagagagcg . 120 

cccggggcgg gggccgcaga cggacagggg ctctgggcgg ccggggagca tgcccgcgcg 180 

gctacgctga atggcgcctc ctctgcgacc cctcgcccgg ctgcgaccgc cggggatgct 240 

gctccgcgcg ctcctgctcc tgctgmtgct cagtcctttg ccaggagtgt ggtgctttag 300 

cgaactgtct tttgtaaaag aaccacagga tgtaactgtc acaagaaagg acccagtcgt 360 

tttagattgc caggctcacg gagaagttcc tattaaggtc acatggttga aaaatggagc 420 

aaaaatgtct gaaaataaac ggatcgaggt tctttctaac ggctctttat acatcagtga 480 

ggtggaaggc aggcgaggag agcagtccga tgaaggattt tatcagtgct tggcaatgaa 540 

caaatatgga gccattctta gtcaaaaagc tcatcttgcc ttatcaacta tttctgcatt 600 

tgaagtccag ccaatttcca ctgaggtcca cgaaggtgga gttgctcgat ttgcatgcaa 660 

gatttcatcc caccctcctg cagtcataac atgggagttc aatcggacaa ctctacctat 720 

gactatggac aggataactg ccctaccaac aggagtattg cagatctatg atgtcagcca 780 

aagggattct ggaaattatc gttgtattgc tgccactgta gcccaccgac gtaaaagtat 840 

ggaggcctcg ctaactgtga ttccagctaa ggagtcaaaa tccttccaca caccarcaat . 900 

tatagcaggt ccacagaaca taacaacatc tcttcatcag actgtagttt tggaatgcat 960 

ggccacagga aatcccaaac caatcatttc ttggagccgc cttgatcaca aatccattga 1020 

tgtctttaat actcgggtac ttggaaatgg taatctcatg atatctgatg tcaggctaca 1080 

acatgctgga gtatatgttt gtcgggccac tacccctggc acacgcaact ttacagttgc 1140 

tatggcaact ttaactgtat tagctcctcc ttcatttgtt gaatggccag aaagtttaac 1200 

aaggcctcga gctggcactg ctcgatttgt gtgtcaggca gaaggaatcc cctctcccaa 1260 

gatgtcatgg ttgaaaaatg gaaggaagat acattcgaat ggtagaatta aaatgtacaa 1320 

cagtaaattg gtaattaacc agattattcc tgaagatgat gctatttatc agtgcatggc 1380 
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tgagaatagc caaggatcta ttttatctag agccagactg actgtagtga tgtcagaaga 1440 
cagacccagt gctccctata atgtacatgc tgaaaccatg tcaagctcag ccattctttt 1500 



gaaagcagaa ggtttaaata atgaagagta tcaagtagtc atcggaaatg acacaactca 1620 
ttatattatt gatgacttag agcctgccag caattatact ttctacattg tagcatatat 1680 
gccaatggga gccagccaga tgtctgacca tgtgacacag aatactctag aggatgttcc 1740 
cctgagacct cctgaaatta gtttgacaag tcgaagtccc actgatattc tcatctcctg 1800 
gctgccaatc ccagccaaat atcggcgggg ccaagtggtg ctgtatcgct tgtctttccg 1860 
cctaagtact gagaattcaa tccaagttct ggagctcccg gggaccacgc atgagtacct 1920 
tttggaaggc ctgaaacctg acagtgtcta cctggttcgg attactgctg ccaccagagt 1980 
ggggctggga gagtcatcag tatggacttc acataggacg cccaaagcta caagcgtgaa 2040 
agcccctaag tctccagagt tgcatttgga gcctctgaac tgtaccacca tttctgtgag 2100 
gtggcagcaa gatgtagagg acacagctgc tattcagggc tacaagctgt actacaagga 2160 
agaagggcag caggagaatg ggcccatttt cttggatacc aaggacctac tctatactct 2220 
cagtggctta gaccccagaa gaaaatatca tgtgagactc ctggcttaca acaacataga 2280 
cgatggctat caggcagatc agactgtcag cactccagga tgcgtgtctg ttcgtgatcg 2340 
catggtccct cctccaccac caccccacca tctctatgcg aaggctaaca cctcatcttc 2400 
catcttcctg cactggagga ggcctgcatt caccgctgca caaatcatta actacaccat 2460 

ccgctgtaat cctgttggcc tgcagaatgc ttctttggtt ctgtaccttc aaacatcaga 2520 

aactcacatg ttggttcaag gtctagaacc aaacaccaaa tacgaatttg ccgttcgatt 2580 

acatgtggat cagctttcca gtccttggag ccctgtagtc taccattcta ctcttccaga 2640 

agcaccagca ggcccaccag ttggagtaaa agtgacatta atagaggatg acactgccct 2700 

ggtttcttgg aaaccccctg atggcccaga aacagttgtg acccgctata ctatcttata 2760 

tgcatctagg aaggcctgga ttgcaggaga gtggcaggtc ttacaccgtg aaggggcaat 2820 

aaccatggct ttgctagaaa acttggtagc aggaaatgtg tacattgtca agatatctgc 2880 

atccaatgag gtgggagaag gacccttttc aaattctgtg gagctggcag tacttccaaa 2940 

ggaaacctct gaatcaaatc agaggcccaa gcgtttagat tctgctgatg ccaaagttta 3000 

ttcaggatat taccatctgg accaaaaatc aatgactggc attgctgtag gtgttggcat 3060 

agccttgacc tgcatcctca tctgtgttct catcttgata taccgaagta aagccaggaa 3120 

atcatctgct tccaagacgg cacagaatgg aactcaacag ttacctcgta ccagtgcctc 3180 

cttagctagt ggaaatgagg taggaaagaa cctggaagga gctgtaggaa atgaagaatc 3240 

tttaatgcca atgatcatgc caaacagctt cattgatgca aagggaggaa ctgacctgat 3300 

aattaatagc tatggtccta taattaaaaa caactctaag aaaaagtggt tttttttcca 3360 

agactcaaag aagatacaag ttgagcagcc tcaaagaaga tttactccag cggtctgctt 3420 

ttaccagcca ggcaccactg tattaatcag tgatgaagac tcccctagct ccccaggtca 3480 

gacaaccagc ttctcaagac cctttggtgt tgcagctgat acagaacatt cagcaaatag 3540 

tgaaggcagc catgagactg gggattctgg gcggttttct catgagtcca acgatgagat 3600 

acatctgtcc tcagttataa gtaccacacc ccccaacctc tgattctttc actggcagtg 3660 

attcaggtgg agattccgca ttgaggaagt gtgaagaccc tgctgtgtca tctgttagtg 3720 

agcagacttc ctccttagtt ctgcagccgc catctgccat gctatgcttt gataaaaatg 3780 



agcctgggag aggccacttt ataattcaga caaagtcatt gcctattctg. tacactacat 



. 1560 
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-continued 

attttccaat ctagacggcc atgctcaggt attctcacca ttaaatctgt tcgaaggaca 3840 
atgaacaggg aaccaaaaaa aaaaaaaaaa aaaa 3874 



What is claimed is: 

1. Ad isolated nucleic acid molecule comprising a nucle- 
otide sequence that: 



(a) encodes the amino acid sequence shown in SEQ ID 
NO:8; and 



(b) hybridizes under stringent conditions to the nucleotide 
sequence of SEQ ID NO:7 or the complement thereof. 

2. An isolated nucleic acid molecule comprising a nucle- 
otide sequence encoding the amino acid sequence shown in 
SEQ ID NO:8. 

***** 
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HUMAN ATPASE PROTEINS AND 
POLYNUCLEOTIDES ENCODING THE 
SAME 

The present application claims the benefit of U.S. Pro- 
visional Application No. 60/164,624 which was filed on 
Nov. 10, 1999 and is herein incorporated by reference in its 
entirety. 

1. INTRODUCTION 

The present invention relates to the discovery, 
identification, and characterization of novel human poly- 
nucleotides encoding proteins that share sequence similarity 
with animal ATPase proteins. The invention encompasses 
the described polynucleotides, host cell expression systems, 
the encoded proteins, fusion proteins, polypeptides and 
peptides, antibodies to the encoded proteins and peptides, 
and genetically engineered animals that either lack or over 
express the disclosed genes, antagonists and agonists of the 
proteins, and other compounds that modulate the expression 
or activity of the proteins encoded by the disclosed genes 
that can be used for diagnosis, drug screening, clinical trial 
monitoring and the treatment of diseases and disorders. 

2. BACKGROUND OF THE INVENTION 

ATPases are proteins that mediate, facilitate, or "power" 
a wide variety of chemical processes within the cell. For 
example, ATPases have been associated with enzymatic, 
catabolic, and metabolic processes as well as transport 
mechanisms, blood coagulation, phagocytosis, etc. 

3. SUMMARY OF THE INVENTION 

The present invention relates to the discovery, 
identification, and characterization of nucleotides that 
encode novel human proteins, and the corresponding amino 
acid sequences of these proteins. The novel human proteins 
(NHPS) described for the first time herein share structural 
similarity with animal ATPases. 

The novel human nucleic acid sequences described 
herein, encode alternative proteins/open reading frames 
(ORFs) of 972, 124, 1,056, 208, 1,270, 422, 1,426, and 578 
amino acids in length (see SEQ ID NOS: 2, 4, 6, 8, 10, 12, 
14, and 16 respectively). 

The invention also encompasses agonists and antagonists 
of the described NHPS, including small molecules, large 
molecules, mutant NHPs, or portions thereof that compete 
with native NHP, peptides, and antibodies, as well as nucle- 
otide sequences that can be used to inhibit the expression of 
the described NHPs (e.g., antisense and ribozyme 
molecules, and gene or regulatory sequence replacement 
constructs) or to enhance the expression of the described 
NHP genes (e.g., expression constructs that place the 
described gene under the control of a strong promoter 
system), and transgenic animals that express a NHP 
transgene, or "knock-outs" (which can be conditional) that 
do not express a functional NHP. A knockout ES cell line has 
been produced that contains a gene trap mutation in the 
murine ortholog of the described locus. 

Further, the present invention also relates to processes for 
identifying compounds that modulate, i.e., act as agonists or 
antagonists, of NHP expression and/or NHP activity that 
utilize purified preparations of the described NHPs and/or 
NHPproduct, or cells expressing the same. Such compounds 
can be used as therapeutic agents for the treatment of any of 
a wide variety of symptoms associated with biological 
disorders or imbalances. 
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4. DESCRIPTION OF THE SEQUENCE LISTING 
AND FIGURES 

The Sequence Listing provides the sequences of the 
described NHP ORFs that encode the described NHP amino 
acid sequences. SEQ ID NO: 17 describes a NHP ORF as 
well as flanking 5' and 3' sequences. 

5. DETAILED DESCRIPTION OF THE 
INVENTION 

10 

The NHPS, described for the first time herein, are novel 
proteins that are expressed in, inter alia, human cell lines, 
predominantly in human kidney and placenta, as well as 
human fetal brain, brain, pituitary, cerebellum, spinal cord, 

15 thymus, spleen, lymph node, bone marrow, trachea, fetal 
liver, prostate, testis, thyroid, adrenal gland, salivary gland, 
stomach, small intestine, colon, uterus, mammary gland, 
adipose, esophagus, bladder, cervix, rectum, ovary, fetal 
kidney, fetal lung and gene trapped human cells. 

20 The present invention encompasses the nucleotides pre- 
sented in the Sequence Listing, host cells expressing such 
nucleotides, the expression products of such nucleotides, 
and: (a) nucleotides that encode mammalian homologs of 
the described genes, including the specifically described 

25 NHPs, and the NHP products; (b) nucleotides that encode 
one or more portions of the NHPs that correspond to 
functional domains, and the polypeptide products specified 
by such nucleotide sequences, including but not limited to 
the novel regions of any active domain(s); (c) isolated 

30 nucleotides that encode mutant versions, engineered or 
naturally occurring, of the described NHPs in which all or a 
part of at least one domain is deleted or altered, and the 
polypeptide products specified by such nucleotide 
sequences, including but not limited to soluble proteins and 

35 peptides in which all or a portion of the signal sequence in 
deleted; (d) nucleotides that encode chimeric fusion proteins 
containing all or a portion of a coding region of an NHP, or 
one of its domains (e.g., a receptor or ligand binding domain, 
accessory protein/self-association domain, etc.) fused to 

40 another peptide or polypeptide; or (e) therapeutic or diag- 
nostic derivatives of the described polynucleotides such as 
oligonucleotides, antisense polynucleotides, ribozymes, 
dsRNA, or gene therapy constructs comprising a sequence 
first disclosed in the Sequence Listing. 

45 As discussed above, the present invention includes: (a) 
the human DNA sequences presented in the Sequence List- 
ing (and vectors comprising the same) and additionally 
contemplates any nucleotide sequence encoding a contigu- 
ous NHP open reading frame (ORF) that hybridizes to a 

50 complement of a DNA sequence presented in the Sequence 
Listing under highly stringent conditions, e.g., hybridization 
to filter-bound DNA in 0.5 M NaHP0 4 , 7% sodium dodecyl 
sulfate (SDS), 1 mM EDTA at 65° C, and washing in 
0.1xSSC/0.1% SDS at 68° C. (Ausubel R M. et al., eds., 

55 1 989, Current Protocols in Molecular Biology, Vol. I, Green 
Publishing Associates, Inc., and John Wiley & sons, Inc., 
New York, at p. 2.10.3) and encodes a functionally equiva- 
lent gene product. Additionally contemplated are any nucle- 
otide sequences that hybridize to the complement of a DNA 

60 sequence that encodes and expresses an amino acid 
sequence presented in the Sequence Listing under moder- 
ately stringent conditions, e.g., washing in 0.2xSSC/0.1% 
SDS at 42° C. (Ausubel et al., 1989, supra), yet still encodes 
a functionally equivalent NHP product. Functional equiva- 

65 lents of a NHP include naturally occurring NHPs present in 
other species and mutant NHPs whether naturally occurring 
or engineered (by site directed mutagenesis, gene shuffling, 
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directed evolution as described in, for example, U.S. Pat. 
No. 5,837,458). The invention also includes degenerate 
nucleic acid variants of the disclosed NHP polynucleotide 
sequences. 

Additionally contemplated are polynucleotides encoding 5 
NHP ORFs, or their functional equivalents, encoded by 
polynucleotide sequences that are about 99, 95, 90, or about 
85 percent similar or identical to corresponding regions of 
the nucleotide sequences of the Sequence Listing (as mea- 
sured by BLAST sequence comparison analysis using, for 10 
example, the GCG sequence analysis package using stan- 
dard default settings). 

The invention also includes nucleic acid molecules, pref- 
erably DNA molecules, that hybridize to, and are therefore 
the complements of, the described NHP gene nucleotide 15 
sequences. Such hybridization conditions may be highly 
stringent or less highly stringent, as described above. In 
instances where the nucleic acid molecules are deoxyoligo- 
nucleotides ("DNA oligos"), such molecules are generally 
about 16 to about 100 bases long, or about 20 to about 80, 20 
or about 34 to about 45 bases long, or any variation or 
combination of sizes represented therein that incorporate a 
contiguous region of sequence first disclosed in the 
Sequence Listing. Such oligonucleotides can be used in 
conjunction with the polymerase chain reaction (PCR) to 25 
screen libraries, isolate clones, and prepare cloning and 
sequencing templates, etc. 

Alternatively, such NHP oligonucleotides can be used as 
hybridization probes for screening libraries, and assessing 3Q 
gene expression patterns (particularly using a micro array or 
high-throughput "chip" format). Additionally, a series of the 
described NHP oligonucleotide sequences, or the comple- 
ments thereof, can be used to represent all or a portion of the 
described NHP sequences. The oligonucleotides, typically 35 
between about 16 to about 40 (or any whole number within 
the stated range) nucleotides in length can partially overlap 
each other and/or the NHP sequence may be represented 
using oligonucleotides that do not overlap. Accordingly, the 
described NHP polynucleotide sequences shall typically 4Q 
comprise at least about two or three distinct oligonucleotide 
sequences of at least about 18, and preferably about 25, 
nucleotides in length that are each first disclosed in the 
described Sequence Listing. Such oligonucleotide 
sequences may begin at any nucleotide present within a 45 
sequence in the Sequence Listing and proceed in either a 
sense (5'-to-3') orientation vis-a-vis the described sequence 
or in an antisense orientation. 

For oligonucleotide probes, highly stringent conditions 
may refer, e.g., to washing in 6xSSC/0.05% sodium pyro- 50 
phosphate at 37° C (for 14-base oligos), 48° C (for 17-base 
oligos), 55° C. (for 20-base oligos), and 60° C. (for 23-base 
oligos). These nucleic acid molecules may encode or act as 
NHP gene antisense molecules, useful, for example, in NHP 
gene regulation (for and/or as antisense primers in amplifi- 55 
cation reactions of NHP gene nucleic acid sequences). With 
respect to NHP gene regulation, such techniques can be used 
to regulate biological functions. Further, such sequences 
may be used as part of ribozyme and/or triple helix 
sequences that are also useful for NHP gene regulation. 60 

Inhibitory antisense or double stranded oligonucleotides 
can additionally comprise at least one modified base moiety 
which is selected from the group including but not limited to 
5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, 
hypoxanthine, xantine, 4- ace ty Icy tosine , 65 
5-(carboxyhydroxylmethyl) uracil, 
5-carboxymethylaminomethyl-2-thiouridine, 



5-carboxymethylaminomethyluracil, dihydrouracil, beta-D- 
galactosylqueosine, inosine, N6-isopentenyladenine, 

1- methylguanine, 1-methylinosine, 2,2-dimethylguanine, 

2- methyladenine, 2-methylguanine, 3-methylcytosine, 
5-methylcytosine, N6-adenine, 7-methylguanine, 
5-methylaminomethyluracil, 5-methoxyaminomethyl-2- 
thiouracil, beta-D-mannosylqueosine, 
5'-methoxycarboxymethyluracil, 5-methoxyuracil, 
2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic 
acid (v), wybutoxosine, pseudouracil, queosine, 
2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 
4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid 
methylester, uracil-5-oxyacetic acid (v), 5-methyl-2- 
thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3) 
w, and 2,6-diaminopurine. 

The antisense oligonucleotide can also comprise at least 
one modified sugar moiety selected from the group includ- 
ing but not limited to arabinose, 2-fluoroarabinose, xylulose, 
and hexose. 

In yet another embodiment, the antisense oligonucleotide 
will comprise at least one modified phosphate backbone 
selected from the group consisting of a phosphorothioate, a 
phosphorodithioate, a phosphoramidothioate, a 
phosphoramidate, a phosphordi amidate, a 
methylphosphonate, an alkyl phosphotriester, and a formac- 
etal or analog thereof. 

In yet another embodiment, the antisense oligonucleotide 
is an a-anomeric oligonucleotide. An a-anomeric oligo- 
nucleotide forms specific double -stranded hybrids with 
complementary RNA in which, contrary to the usual p-units, 
the strands run parallel to each other (Gautier et al., 1987, 
Nucl. Acids Res. 15:6625-6641). The oligonucleotide is a 
2 1 -0-methylribo nucleotide (Inoue et al., 1987, Nucl. Acids 
Res. 15:6131-6148), or a chimeric RNA-DNA analogue 
(Inoue et al., 1987, FEBS Lett, 215:327-330). Alternatively, 
double stranded RNA can be used to disrupt the expression 
and function of a targeted NHP. 

Oligonucleotides of the invention can be synthesized by 
standard methods known in the art, e.g. by use of an 
automated DNA synthesizer (such as are commercially 
available from Biosearch, Applied Biosystems, etc.). As 
examples, phosphorothioate oligonucleotides can be synthe- 
sized by the method of Stein et al. (1988, Nucl Acids Res. 
16:3209), and methylphosphonate oligonucleotides can be 
prepared by use of controlled pore glass polymer supports 
(Sarin et al., 1988, Proc. Natl. Acad. Sci. U.S.A. 
85:7448-7451), etc. 

Low stringency conditions are well known to those of 
skill in the art, and will vary predictably depending on the 
specific organisms from which the library and the labeled 
sequences are derived. For guidance regarding such condi- 
tions see, for example, Sambrook et al., 1989, Molecular 
Cloning, A Laboratory Manual (and periodic updates 
thereof), Cold Springs Harbor Press, N.Y.; and Ausubel et 
aL, 1989, Current Protocols in Molecular Biology, Green 
Publishing Associates and Wiley Interscience, N.Y. 

Alternatively, suitably labeled NHP nucleotide probes can 
be used to screen a human genomic library using appropri- 
ately stringent conditions or by PCR. The identification and 
characterization of human genomic clones is helpful for 
identifying polymorphisms (including, but not limited to, 
nucleotide repeats, microsatellite alleles, single nucleotide 
polymorphisms, - or coding single nucleotide 
polymorphisms), determining the genomic structure of a 
given locus/allele, and designing diagnostic tests. For 
example, sequences derived from regions adjacent to the 
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intron/exon boundaries of the human gene can be used to 
design primers for use in amplification assays to detect 
mutations within the exons, introns, splice sites (e.g., splice 
acceptor and/or donor sites), etc., that can be used in 
diagnostics and pharmacogenomics. 

Further, a NHP gene homolog can be isolated from 
nucleic acid from an organism of interest by performing 
PCR using two degenerate or "wobble" oligonucleotide 
primer pools designed on the basis of amino acid sequences 
within the NHP products disclosed herein. The template for 
the reaction may be total RNA, mRNA, and/or cDNA 
obtained by reverse transcription of mRNA prepared from 
human or non-human cell lines or tissue known or suspected 
to express an allele of a NHP gene. The PCR product can be 
subcloned and sequenced to ensure that the amplified 
sequences represent the sequence of the desired NHP gene. 
The PCR fragment can then be used to isolate a full length 
cDNA clone by a variety of methods. For example, the 
amplified fragment can be labeled and used to screen a 
cDNA library, such as a bacteriophage cDNA library. 
Alternatively, the labeled fragment can be used to isolate 
genomic clones via the screening of a genomic library. 

PCR technology can also be used to isolate full length 
cDNA sequences. For example, RNA can be isolated, fol- 
lowing standard procedures, from an appropriate cellular or 
tissue source (i.e., one known, or suspected, to express a 
NHP gene). A reverse transcription (RT) reaction can be 
performed on the RNA using an oligonucleotide primer 
specific for the most 5' end of the amplified fragment for the 
priming of first strand synthesis. The resulting RNA/DNA 
hybrid may then be "tailed" using a standard terminal 
transferase reaction, the hybrid may be digested with RNase 
H, and second strand synthesis may then be primed with a 
complementary primer. Thus, cDNA sequences upstream of 
the amplified fragment can be isolated. For a review of 
cloning strategies that can be used, see e.g., Sambrook et al., 
1989, supra. 

A cDNA encoding a mutant NHP gene can be isolated, for 
example, by using PCR. In this case, the first cDNA strand 
may be synthesized by hybridizing an oligo-dT oligonucle- 
otide to mRNA isolated from tissue known or suspected to 
be expressed in an individual putatively carrying a mutant 
NHP allele, and by extending the new strand with reverse 
transcriptase. The second strand of the cDNA is then syn- 
thesized using an oligonucleotide that hybridizes specifi- 
cally to the 5' end of the normal gene. Using these two 
primers, the product is then amplified via PCR, optionally 
cloned into a suitable vector, and subjected to DNA 
sequence analysis through methods well known to those of 
skill in the art. By comparing the DNA sequence of the 
mutant NHP allele to that of a corresponding normal NHP 
allele, the mutation(s) responsible for the loss or alteration 
of function of the mutant NHP gene product can be ascer- 
tained. 

Alternatively, a genomic library can be constructed using 
DNA obtained from an individual suspected of or known to 
carry a mutant NHP allele (e.g., a person manifesting a 
NHP-associated phenotype such as, for example, obesity, 
high blood pressure, connective tissue disorders, infertility, 
etc.), or a cDNA library can be constructed using RNA from 
a tissue known, or suspected, to express a mutant NHP 
allele. A normal NHP gene, or any suitable fragment thereof, 
can then be labeled and used as a probe to identify the 
corresponding mutant NHP allele in such libraries. Clones 
containing mutant NHP gene sequences can then be purified 
and subjected to sequence analysis according to methods 
well known to those skilled in the art. 
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Additionally, an expression library can be constructed 
utilizing cDNA synthesized from, for example, RNA iso- 
lated from a tissue known, or suspected, to express a mutant 
NHP allele in an individual suspected of or known to carry 

5 such a mutant allele. In this manner, gene products made by 
the putatively mutant tissue can be expressed and screened 
using standard antibody screening techniques in conjunction 
with antibodies raised against a normal NHP product, as 
described below. (For screening techniques, see, for 

1Q example, Harlow, E. and Lane, eds., 1988, "Antibodies: A 
Laboratory Manual", Cold Spring Harbor Press, Cold Spring 
Harbor.) 

Additionally, screening can be accomplished by screening 
with labeled NHP fusion proteins, such as, for example, 

15 alkaline phosphatase-NHP or NHP-alkaline phosphatase 
fusion proteins. In cases where a NHP mutation results in an 
expressed gene product with altered function (e.g., as a 
result of a missense or a frameshift mutation), polyclonal 
antibodies to a NHP are likely to cross-react with a corre- 

20 sponding mutant NHP gene product. Library clones detected 
via their reaction with such labeled antibodies can be 
purified and subjected to sequence analysis according to 
methods well known in the art. 

The invention also encompasses (a) DNA vectors that 

25 contain any of the foregoing NHP coding sequences and/or 
their complements (i.e., antisense); (b) DNA expression 
vectors that contain any of the foregoing NHP coding 
sequences operatively associated with a regulatory element 
that directs the expression of the coding sequences (for 

30 example, baculo virus as described in U.S. Pat. No. 5,869, 
336 herein incorporated by reference); (c) genetically engi- 
neered host cells that contain any of the foregoing NHP 
coding sequences operatively associated with a regulatory 
element that directs the expression of the coding sequences 

35 in the host cell; and (d) genetically engineered host cells that 
express an endogenous NHP gene under the control of an 
exogenously introduced regulatory element (i.e., gene 
activation). As used herein, regulatory elements include, but 
are not limited to, inducible and non-inducible promoters, 

40 enhancers, operators and other elements known to those 
skilled in the art that drive and regulate expression. Such 
regulatory elements include but are not limited to the human 
cytomegalovirus (hCMV) immediate early gene, 
regulatable, viral elements (particularly retroviral LTR 

45 promoters), the early or late promoters of SV40 adenovirus, 
the lac system, the trp system, the TAC system, the TRC 
system, the major operator and promoter regions of phage 
lambda, the control regions of fd coat protein, the promoter 
for 3-phosphoglycerate kinase (PGK), the promoters of acid 

50 phosphatase, and the promoters of the yeast a-mating fac- 
tors. 

The present invention also encompasses antibodies and 
anti-idiotypic antibodies (including Fab fragments), antago- 
nists and agonists of the NHP, as well as compounds or 

55 nucleotide constructs that inhibit expression of a NHP gene 
(transcription factor inhibitors, antisense and ribozyme 
molecules, or gene or regulatory sequence replacement 
constructs), or promote the expression of a NHP (e.g., 
expression constructs in which NHP coding sequences are 

60 operatively associated with expression control elements 
such as promoters, promoter/enhancers, etc.). 

The NHPs or NHP peptides, NHP fusion proteins, NHP 
nucleotide sequences, antibodies, antagonists and agonists 
can be useful for the detection of mutant NHPs or inappro- 

65 priately expressed NHPs for the diagnosis of disease. The 
NHP proteins or peptides, NHP fusion proteins, NHP nucle- 
otide sequences, host cell expression systems, antibodies, 
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antagonists, agonists and genetically engineered cells and as well as analogues and derivatives thereof. Further, cor- 

animals can be used for screening for drugs (or high responding NHP homologues from other species are encom- 

throughput screening of combinatorial libraries) effective in passed by the invention. In fact, any NHP protein encoded 

the treatment of the symptomatic or phenotypic manifesta- by the NHP nucleotide sequences described above are within 

tions of perturbing the normal function of NHP in the body. 5 the scope of the invention, as are any novel polynucleotide 

The use of engineered host cells and/or animals may offer an sequences encoding all or any novel portion of an amino 

advantage in that such systems allow not only for the acid sequence presented in the Sequence Listing. The degen- 

identification of compounds that bind to the endogenous erate nature of ' he genetic code is well known, and, 

receptor for an NHP, but can also identify compounds that accordingly, each ammo acid presented in the Sequence 

trigger NHP-mediated activities or pathways. 10 Ltau*. is generally representative of the well known 

ir ii *u MUTi ■ . . j ii c nucleic acid "triplet" codon, or in many cases codons, that 

Finally, the NHP products can be used as therapeutics. For , iL r . ., A . J 4 . , , , . 

i i ii * , vrrj-n ..J can encode the ammo acid. As such, as contemplated herein, 

example, soluble derivatives such as NHP peptides/domams . . * j ■ „i_ o t * «* 

r >u mud xtud f * » • a « the ammo acid sequences presented m the Sequence Listmg, 

corresponding the NHPs, NHP fusion protein products . . u . *T 4 . . / r i 

/ -ii xmm c • * * • c • c vTiin when taken together with the genet ic code (see, for example, 

(especially NHP-Ig fusion proteins, i.e., fusions of a NHP, or _ , . A . t 6 inft f HU ? , ^ n A- , » moi: / 

a • c KTun ♦ i c \ xTiTTi *-w a' a 1 Table 4-1 at page 109 of "Molecular Cell Biology", 1986, J. 

a domain of a NHP, to an IgFc), NHP antibodies and 15 - f & . » ' 

4 . ... . j * | j . r * n . x . Darnell et al. eds., Scientific American Books, New York, 

anti-idiotypic antibodies (including Fab fragments), antago- v T w L • * * JL \ • « 

nists or agonists (including compounds thatmodulate or act N.Y., herein mcorporated by reference) are generally rep- 

on downstream targets in I NHP-mediated pathway) can be rf f nta,lve ot f "* m * > P ermutatl °° s "mbprtions 
j . j- .i . . j- j- j r ■ . .u of nucleic acid sequences that can encode such amino acid 

used to directly treat diseases or disorders. For instance, the ^ 

administration of an effective amount of soluble NHP, or a 20 se V[ en< j es - 

NHP-IgFc fusion protein or an anti-idiotypic antibody (or its . ^ mvemi0 , n 4)50 \ DC °*?P asses u' f" 
Fab) that mimics the NHP could activate or effectively 'J 0 ™ 1 ^ equivalent to the NHPs encoded by the presently 
antagonize the endogenous NHP receptor. Nucleotide con- described nucleotide sequences as judged by any of a 
structs encoding such NHP products can be used to geneti- n ™*« of criteria, including, but limited to, the ability 
cally engineer host cells to express such products in vivo; 25 to bind and cleave a substrate of a NHP, or the ability to 
these genetically engineered cells function as "bioreactors" effect an ldentical or complementary downstream pathway, 
in the body delivering a continuous supply of a NHP, a NHP or a chan 8 e m ceUular metabohsm (e.g., proteolytic activity, 
peptide, or a NHP fusion protein to the body. Nucleotide '° a flux > V™ 1 ™ phosphorylation, transport, etc.). Such 
constructs encoding functional NHPs, mutant NHPs, as well functionally equivalent NHP proteins include, but are not 
as antisense and ribozyme molecules can also be used in 30 limited to, additions ; or substitutions of amino acid residues 
"gene therapy" approaches for the modulation of NHP within the amino acid sequence encoded by the NHP nude- 
expression. Thus, the invention also encompasses pharma- otlde sequence described above, but which result in a silent 
ceutical formulations and methods for treating biological chan S e < .t hus producing a functionally equivalent gene prod- 
disorders uct " ^ mino acid substitutions may be made on the basis of 
. ril _ . 4 . , ., , . « similarity in polarity, charge, solubility, hydrophobicity, 
Various aspects of the invention are described in greater . , «. l. r»u -j 
. . * i * ^ i_ i a. * i l hydrophihcity, and/or the amphipatmc nature of the residues 
detail in the subsections below. • i j r> i i /uj • -j 

involved. For example, nonpolar (hydrophobic) ammo acids 

5.1 The NHP Sequences include alanine, leucine, isoleucine, valine, proline, 

The cDNA sequences and the corresponding deduced phenylalanine, tryptophan, and methionine; polar neutral 

amino acid sequences of the described NHPs are presented 40 amino acids include glycine, serine, threonine, cysteine, 

in the Sequence Listing. The NHP nucleotides were obtained tyrosine, asparagine, and glutamine; positively charged 

from clustered human gene trapped sequences, ESTs and a (basic) amino acids include arginine, lysine, and histidine; 

human placenta cDNA library (Edge Biosystems, and negatively charged (acidic) amino acids include aspartic 

Gaithersburg, Md.). The described sequences share stnic- acid and glutamic acid. 

rural similarity with calcium transporting ATPases and ami- 45 A variety of host-expression vector systems can be used 

nophospholipid transporters. to express the NHP nucleotide sequences of the invention. 

_ „ kT „„„ _ ^ __ _ , . , Where, as in the present instance, the NHP peptide or 
5.2 NHPS and NHP Polypeptides polypeptide is thought to be membrane protein, the hydro- 
NHPs, polypeptides, peptide fragments, mutated, phobic regions of the protein can be excised and the result- 
truncated, or deleted forms of the NHPs, and/or NHP fusion 50 ing soluble peptide or polypeptide can be recovered from the 
proteins can be prepared for a variety of uses. These uses culture media. Such expression systems also encompass 
include, but are not limited to, the generation of antibodies, engineered host cells that express a NHP, or functional 
as reagents in diagnostic assays, for the identification of equivalent, in situ. Purification or enrichment of a NHP from 
other cellular gene products related to a NHP, as reagents in such expression systems can be accomplished using appro- 
assays for screening for compounds that can be as pharma- 55 priate detergents and lipid micelles and methods well known 
ceutical reagents useful in the therapeutic treatment of to those skilled in the art. However, such engineered host 
mental, biological, or medical disorders and disease. Given cells themselves may be used in situations where it is 
the similarity information and expression data, the described important not only to retain the structural and functional 
NHPs can be targeted (by drugs, oligos, antibodies, etc,) in characteristics of the NHP, but to assess biological activity, 
order to treat disease, or to therapeutically augment the 60 e.g., in drug screening assays. 

efficacy of therapeutic agents. The expression systems that may be used for purposes of 

The Sequence Listing discloses the amino acid sequences the invention include but are not limited to microorganisms 

encoded by the described NHP genes. The NHPs typically such as bacteria (e.g., E. coli, B. subtilis) transformed with 

display initiator methionines in DNA sequence contexts recombinant bacteriophage DNA, plasmid DNA or cosmid 

consistent with a translation initiation site. 65 DNA expression vectors containing NHP nucleotide 

The NHP amino acid sequences of the invention include sequences; yeast (e.g., Saccharomyces, Pichia) transformed 

the amino acid sequence presented in the Sequence Listing with recombinant yeast expression vectors containing NHP 
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nucleotide sequences; insect cell systems infected with initiation codon and adjacent sequences, is inserted into the 

recombinant virus expression vectors (e.g., baculovirus) appropriate expression vector, no additional translational 

containing NHP sequences; plant cell systems infected with control signals may be needed. However, in cases where 

recombinant virus expression vectors (e.g., cauliflower only a portion of a NHP coding sequence is inserted, 

mosaic virus, CaMV; tobacco mosaic virus, TMV) or trans- 5 exogenous translational control signals, including, perhaps, 

formed with recombinant plasmid expression vectors (e.g., the ATG initiation codon, must be provided. Furthermore, 

Ti plasmid) containing NHP nucleotide sequences; or mam- the initiation codon must be in phase with the reading frame 

malian cell systems (e.g., COS, CHO, BHK, 293, 3T3) of the desired coding sequence to ensure translation of the 

harboring recombinant expression constructs containing entire insert. These exogenous translational control signals 

promoters derived from the genome of mammalian cells 10 and initiation codons can be of a variety of origins, both 

(e.g., metallothionein promoter) or from mammalian viruses natural and synthetic. The efficiency of expression may be 

(e.g., the adenovirus late promoter; the vaccinia virus 7.5K enhanced by the inclusion of appropriate transcription 

promoter) enhancer elements, transcription terminators, etc. (See Bitt- 

In bacterial systems, a number of expression vectors may ne \ et ^ 1987 ' ^ elho ^ in Enz y mo1 ,; 153:516-544). 
be advantageously selected depending upon the use intended 15 , , In ? u ddUl0n > a I 1051 strain m ^ chosen ,hat ^u- 
for the NHP product being expressed. For example, when a lat f ex P re *>"» of the inserted sequences, or modules 
i r i. * • • * l j j r .l. an d processes the gene product in the specific fashion 
large quantity of such a protein is to be produced for the ^ Sucfa modi | catk) P ( glycosylation) and pro- 
generation of pharmaceutical composrtions of or containing ^ ( ^ ) of ^ ^ 
NHP, or for raising antibodies to a NHP, vectors that direct tant for ^ flmction % { \ he £ rotein ^ DiScKtiL hos ' t cells h F ave 
the expression of high levels of fusion protein products that 20 characteristic and specific mec hanisms for the post- 
are readily purified may be desirable. Such vectors include, translational processing and modification of proteins and 
but are not limited, to the E. coli expression vector pUR278 gene products. Appropriate cell lines or host systems can be 
(Ruther et al., 1983, EMBO J. 2:1791), in which a NHP chosen to ensure the correct modification and processing of 
coding sequence may be ligated individually into the vector the foreign protein expressed. To this end, eukaryotic host 
in frame with the lacZ coding region so that a fusion protein 1S cells which possess the cellular machinery for proper pro- 
is produced; pIN vectors (Inouye & Inouye, 1985, Nucleic cessing of the primary transcript, glycosylation, and phos- 
Acids Res. 13:3101-3109; Van Heeke & Schuster, 1989, J. phorylation of the gene product may be used. Such mam- 
Biol. Chem. 264:5503-5509); and the like. pGEX vectors malian host cells include, but are not limited to, CHO, 
(Pharmacia or American Type Culture Collection) can also VERO, BHK, HeLa, COS, MDCK, 293, 3T3, WI38, and in 
be used to express foreign polypeptides as fusion proteins 30 particular, human cell lines. 

with glutathione S-transferase (GST) . In general, such For long-term, high-yield production of recombinant 

fusion proteins are soluble and can easily be purified from proteins, stable expression is preferred. For example, cell 

lysed cells by adsorption to glutathione-agarose beads fol- lines which stably express the NHP sequences described 

lowed by elution in the presence of free glutathione. The above can be engineered. Rather than using expression 

PGEX vectors are designed to include thrombin or factor Xa 35 vectors which contain viral origins of replication, host cells 

protease cleavage sites so that the cloned target gene product can be transformed with DNA controlled by appropriate 

can be released from the GST moiety. expression control elements (e.g., promoter, enhancer 

In an insect system, Autographa californica nuclear poly- sequences, transcription terminators, polyadenylation sites, 

hidrosis virus (AcNPV) is used as a vector to express foreign etc.), and a selectable marker. Following the introduction of 

genes. The virus grows in Spodoptera frugiperda cells. A 40 the foreign DNA, engineered cells may be allowed to grow 

NHP gene coding sequence may be cloned individually into for 1-2 days in an enriched media, and then are switched to 

non-essential regions (for example the polyhedrin gene) of a selective media. The selectable marker in the recombinant 

the virus and placed under control of an AcNPV promoter plasmid confers resistance to the selection and allows cells 

(for example the polyhedrin promoter). Successful insertion to stably integrate the plasmid into their chromosomes and 

of NHP gene coding sequence will result in inactivation of 45 grow to form foci which in turn can be cloned and expanded 

the polyhedrin gene and production of non-occluded recom- into cell lines. This method may advantageously be used to 

binant virus (i.e., virus lacking the proteinaceous coat coded engineer cell lines which express the NHP product. Such 

for by the polyhedrin gene). These recombinant viruses are engineered cell lines may be particularly useful in screening 

then used to infect Spodoptera frugiperda cells in which the and evaluation of compounds that affect the endogenous 

inserted gene is expressed (e.g., see Smith et al., 1983, J. 50 activity of the NHP product. 

Virol. 46:584; Smith, U.S. Pat. No. 4,215,051). a number of selection systems may be used, including but 

In mammalian host cells, a number of viral-based expres- not limited to the herpes simplex virus thymidine kinase 

sion systems may be utilized. In cases where an adenovirus (Wigler, et al., 1977, Cell 11:223), hypoxanthine-guanine 

is used as an expression vector, the NHP nucleotide phosphoribosyltransferase (Szybalska & Szybalski, 1962, 

sequence of interest may be ligated to an adenovirus 55 Proc. Natl. Acad. Sci. USA 48:2026), and adenine phospho- 

transcription/translation control complex, e.g., the late pro- ribosyltransferase (Lowy, et al., 1980, Cell 22:817) genes 

moter and tripartite leader sequence. This chimeric gene can be employed in tk", hgprt" or aprt" cells, respectively, 

may then be inserted in the adenovirus genome by in vitro Also, antimetabolite resistance can be used as the basis of 

or in vivo recombination. Insertion in a non-essential region selection for the following genes: dhfr, which confers resis 

of the viral genome (e.g., region El or E3) will result in a 60 tance to methotrexate (Wigler, et al., 1980, Natl. Acad. Sci 

recombinant virus that is viable and capable of expressing a USA 77:3567; O'Hare, et al., 1981, Proc. Natl. Acad. Sci 

NHP product in infected hosts (e.g., See Logan & Shenk, USA 78:1527); gpt, which confers resistance to mycophe 

1984, Proc. Natl. Acad. Sci. USA 81:3655-3659). Specific nolic acid (Mulligan & Berg, 1981, Proc. Nad. Acad. Sci 

initiation signals may also be required for efficient transla- USA 78:2072); neo, which confers resistance to the ami 

tion of inserted NHP nucleotide sequences. These signals 65 noglycoside G-418 (Colberre-Garapin, et al., 1981, J. Mol 

include the ATG initiation codon and adjacent sequences. In Biol. 150:1); and hygro, which confers resistance to hygro 

cases where an entire NHP gene or cDNA, including its own mycin (Santerre, et al., 1984, Gene 30:147). 
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Alternatively, any fusion protein can be readily purified 
by utilizing an antibody specific for the fusion protein being 
expressed. For example, a system described by Janknecht et 
al. allows for the ready purification of non-denatured fusion 
proteins expressed in human cell lines (Janknecht, et al., 
1991, Proc. Nad. Acad. Sci. USA 88:8972-8976). In this 
system, the gene of interest is subcloned into a vaccinia 
recombination plasmid such that the gene's open reading 
frame is translationally fused to an amino-terminal tag 
consisting of six histidine residues. Extracts from cells 
infected with recombinant vaccinia virus are loaded onto 
Ni 2+ .nitriloacetic acid-agarose columns and histidine-tagged 
proteins are selectively eluted with imidazole-containing 
buffers. 

5.3 Antibodies to NHP Products 

Antibodies that specifically recognize one or more 
epitopes of a NHP, or epitopes of conserved variants of a 
NHP, or peptide fragments of a NHP are also encompassed 
by the invention. Such antibodies include but are not limited 
to polyclonal antibodies, monoclonal antibodies (mAbs), 
humanized or chimeric antibodies, single chain antibodies, 
Fab fragments, F(ab') 2 fragments, fragments produced by a 
Fab expression library, anti-idiotypic (anti-Id) antibodies, 
and epitope -binding fragments of any of the above. 

The antibodies of the invention may be used, for example, 
in the detection of NHP in a biological sample and may, 
therefore, be utilized as part of a diagnostic or prognostic 
technique whereby patients may be tested for abnormal 
amounts of NHP. Such antibodies may also be utilized in 
conjunction with, for example, compound screening 
schemes for the evaluation of the effect of test compounds 
on expression and/or activity of a NHP gene product. 
Additionally, such antibodies can be used in conjunction 
gene therapy to, for example, evaluate the normal and/or 
engineered NHP-expressing cells prior to their introduction 
into the patient. Such antibodies may additionally be used as 
a method for the inhibition of abnormal NHP activity. Thus, 
such antibodies may, therefore, be utilized as part of treat- 
ment methods. 

For the production of antibodies, various host animals 
may be immunized by injection with the NHP, an NHP 
peptide (e.g., one corresponding the a functional domain of 
an NHP), truncated NHP polypeptides (NHP in which one or 
more domains have been deleted), functional equivalents of 
the NHP or mutated variant of the NHP. Such host animals 
may include but are not limited to pigs, rabbits, mice, goats, 
and rats, to name but a few. Various adjuvants may be used 
to increase the immunological response, depending on the 
host species, including but not limited to Freund's adjuvant 
(complete and incomplete), mineral salts such as aluminum 
hydroxide or aluminum phosphate, surface active substances 
such as lysolecithin, pluronic polyols, polyanions, peptides, 
oil emulsions, and potentially useful human adjuvants such 
as BCG (bacille Calmette-Guerin) and Corynebacteriwn 
parvum. Alternatively, the immune response could be 
enhanced by combination and or coupling with molecules 
such as keyhole limpet hemocyanin, tetanus toxoid, dipthe- 
ria toxoid, ovalbumin, cholera toxin or fragments thereof. 
Polyclonal antibodies are heterogeneous populations of anti- 
body molecules derived from the sera of the immunized 
animals. 

Monoclonal antibodies, which are homogeneous popula- 
tions of antibodies to a particular antigen, can be obtained by 
any technique which provides for the production of antibody 
molecules by continuous cell lines in culture. These include, 
but are not limited to, the hybridoma technique of Kohler 
and Milstein, (1975, Nature 256:495-497; and U.S. Pat. No. 
4,376,110), the human B-cell hybridoma technique (Kosbor 
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et al., 1983, Immunology Today 4:72; Cole et al., 1983, 
Proc. Natl. Acad. Sci. USA 80:2026-2030), and the EBV- 
hybridoma technique (Cole et al., 1985, Monoclonal Anti- 
bodies And Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). 

5 Such antibodies may be of any immunoglobulin class 
including IgG, IgM, IgE, IgA, IgD and any subclass thereof. 
The hybridoma producing the mAb of this invention may be 
cultivated in vitro or in vivo. Production of high titers of 
mabs in vivo makes this the presently preferred method of 

10 production. 

In addition, techniques developed for the production of 
"chimeric antibodies" (Morrison et al., 1984, Proc. Natl. 
Acad. Sci., 81:6851-6855; Neuberger et al., 1984, Nature, 

i5 312:604-608; Takeda et al., 1985, Nature, 314:452-454) by 
splicing the genes from a mouse antibody molecule of 
appropriate antigen specificity together with genes from a 
human antibody molecule of appropriate biological activity 
can be used. A chimeric antibody is a molecule in which 

2Q different portions are derived from different animal species, 
such as those having a variable region derived from a murine 
mAb and a human immunoglobulin constant region. Such 
technologies are described in U.S. Pat. Nos. 6,075,181 and 
5,877,397 and their respective disclosures which are herein 

25 incorporated by reference in their entirety. 

Alternatively, techniques described for the production of 
single chain antibodies (U.S. Pat. No. 4,946,778; Bird, 1988, 
Science 242:423-426; Huston et al., 1988, Proc. Natl. Acad. 
Sci. USA 85:5879-5883; and Ward et al, 1989, Nature 
30 334:544-546) can be adapted to produce single chain anti- 
bodies against NHP gene products. Single chain antibodies 
are formed by linking the heavy and light chain fragments of 
the Fv region via an amino acid bridge, resulting in a single 
chain polypeptide. 

35 Antibody fragments which recognize specific epitopes 
may be generated by known techniques. For example, such 
fragments include, but are not limited to: the F(ab') 2 frag- 
ments which can be produced by pepsin digestion of the 
antibody molecule and the Fab fragments which can be 

40 generated by reducing the disulfide bridges of the F(ab') 2 
fragments. Alternatively, Fab expression libraries may be 
constructed (Huse et al., 1989, Science, 246:1275-1281) to 
allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity. 

45 

Antibodies to a NHP can, in turn, be utilized to generate 
anti-idiotype antibodies that "mimic" a given NHP, using 
techniques well known to those skilled in the art. (See, e.g., 
Greenspan & Bona, 1993, FASEB J 7(5)437^44; and 
Nissinoff, 1991, J. Immunol. 147(8):2429-2438). For 

50 example antibodies which bind to a NHP domain and 
competitively inhibit the binding of NHP to its cognate 
receptor can be used to generate anti-idiotypes that "mimic" 
the NHP and, therefore, bind and activate or neutralize a 
receptor. Such anti-idiotypic antibodies or Fab fragments of 

55 such anti-idiotypes can be used in therapeutic regimens 
involving a NHP mediated pathway. 

The present invention is not to be limited in scope by the 
specific embodiments described herein, which are intended 
as single illustrations of individual aspects of the invention, 

60 and functionally equivalent methods and components are 
within the scope of the invention. Indeed, various modifi- 
cations of the invention, in addition to those shown and 
described herein will become apparent to those skilled in the 
art from the foregoing description. Such modifications are 

65 intended to fall within the scope of the appended claims. All 
cited publications, patents, and patent applications are herein 
incorporated by reference in their entirety. 
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SEQUENCE LISTING 

<160> NUMBER OF SEQ ID NOS: 17 

<210> SEQ ID NO 1 

<2U> LENGTH: 2919 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<4 00> SEQUENCE: 1 



atgactgagg 


ctctccaatg 


ggccagatat 


cactggcgac 


ggctgatcag 


aggtgcaacc 


60 


agggatgatg 


attcagggcc 


atacaactat 


tcctcgttgc 


tcgcctgtgg 


gcgcaagtcc 


120 


tctcagatcc 


ctaaactgtc 


aggaaggcac 


cggattgttg 


ttccccacat 


ccagcccttc 


180 


aaggatgagt 


atgagaagtt 


ctccggagcc 


tatgtgaaca 


atcgaatacg 


aacaacaaag 


240 


tacacacttc 


tgaattttgt 


gccaagaaat 


ttatttgaac 


aatttcacag 


agctgccaat 


300 


ttatatttcc 


tgttcctagt 


tgtcctgaac 


tgggtacctt 


tggtagaagc 


cttccaaaag 


360 


gaaatcacca 


tgttgcctct 


ggtggtggtc 


cttacaatta 


tcgcaattaa 


agatggcctg 


420 


gaagattatc 


ggaaatacaa 


aattgacaaa 


cagatcaata 


atttaataac 


taaagtttat 


480 


agtaggaaag 


agaaaaaata 


cattgaccga 


tgctggaaag 


acgttactgt 


tggggacttt 


540 


attcgcctct 


cctgcaacga 


ggtcatccct 


gcagacatgg 


tactactctt 


ttccactgat 


600 


ccagatggaa 


tctgtcacat 


tgagacttct 


ggtcttgatg 


gagagagcaa 


tttaaaacag 


660 


aggcaggtgg 


ttcggggata 


tgcagaacag 


gactctgaag 


ttgatcctga 


gaagttttcc 


720 


agtaggatag 


aatgtgaaag 


cccaaacaat 


gacctcagca 


gattccgagg 


cttcctagaa 


780 


cattccaaca 


aagaacgcgt 


gggtctcagt 


aaagaaaatt 


tgttgcttag 


aggatgcacc 


B40 


attagaaaca 


cagaggctgt 


tgtgggcatt 


gtggtttatg 


caggccatga 


aaccaaagca 


900 


atgctgaaca 


acagtgggcc 


acggtataag 


cgcagcaaat 


tagaaagaag 


agcaaacaca 


960 


gatgtcctct 


ggtgtgtcat 


gcttctggtc 


ataatgtgct 


taactggcgc 


agtaggtcat 


1020 


ggaatctggc 


tgagcaggta 


tgaaaagatg 


cattttttca 


atgttcccga 


gcctgatgga 


1080 


catatcatat 


caccactgtt 


ggcaggattt 


tatatgtttt 


ggaccatgat 


cattttgtta 


1140 


caggtcttga 


ttcctatttc 


tctctatgtt 


tccatcgaaa 


ttgtgaagct 


tggacaaata 


1200 


tatttcattc 


aaagtgatgt 


ggatttctac 


aatgaaaaaa 


tggattctat 


tgttcagtgc 


1260 


cgagccctga 


acatcgccga 


ggatctggga 


cagattcagt 


acctcttttc 


cgataagaca 


1320 


ggaaccctca 


ctgagaataa 


gatggttttt 


cgaagatgta 


gtgtggcagg 


atttgattac 


1380 


tgccatgaag 


aaaatgccag 


gaggttggag 


tcctatcagg 


aagctgtctc 


tgaagatgaa 


1440 


gattttatag 


acacagtcag 


tggttccctc 


agcaatatgg 


caaaaccgag 


agcccccagc 


1500 


tgcaggacag 


ttcataatgg 


gcctttggga 


aataagccct 


caaatcatct 


tgctgggagc 


1560 


tcttttactc 


taggaagtgg 


agaaggagcc 


agtgaagtgc 


ctcattccag 


acaggctgct 


1620 


ttcagtagcc 


ccattgaaac 


agacgtggta 


ccagacacca 


ggcttttaga 


caaatttagt 


1680 


cagattacac 


ctcggctctt 


tatgccacta 


gatgagacca 


tccaaaatcc 


accaatggaa 


1740 


actttgtaca 


ttatcgactt 


tttcattgca 


ttggcaattt 


gcaacacagt 


agtggtttct 


1800 


gctcctaacc 


aaccccgaca 


aaagatcaga 


cacccttcac 


tgggggggtt 


gcccattaag 


1B60 


tctttggaag 


agattaaaag 


tcttttccag 


agatggtctg 


tccgaagatc 


aagttctcca 


1920 


tcgcttaaca 


gtgggaaaga 


gccatcttct 


ggagttccaa 


acgcctttgt 


gagcagactc 


1980 


cctctcttta 


gtcgaatgaa 


accagcttca 


cctgtggagg 


aagaggtctc 


ccaggtgtgt 


2040 
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gagagccccc 


agtgctccag 


tagctcagct 


tgctgcacag aaacagagaa acaacacggt 


2100 


gatgcaggcc 


tcctgaatgg 


caaggcagag 


tccctccctg gacagccatt 


ggcctgcaac 


2160 


ctgtgttatg 


aggccgagag 


cccagacgaa 


gcggccttag tgtatgccgc 


cagggcttac 


2220 


caatgcactt 


tacggtctcg 


gacaccagag 


caggtcatgg tggactttgc 


tgctttggga 


2280 


ccattaacat 


ttcaactcct 


acacatcctg 


ccctttgact cagtaagaaa 


aagaatgtct 


2340 


gttgtggtcc 


gacaccctct 


ttccaatcaa 


gttgtggtgt atacgaaagg 


cgctgattct 


2400 


ataatcatcra 


agttactgtc 


ocrtoacttcc 


ccagatggag caagtctgga 


gaaacaacag 


2460 


atgatagtaa 


gggagaaaac 


ccagaagcac 


ttggatgact atgccaaaca 


aggccttcgt 


2520 


actttatgta 


tagcaaagaa 


ggtcatgagt 


gacactgaat atgcagagtg 


gctgaggaat 


2580 


cattttttag 


ctgaaaccag 


cattgacaac 


agggaagaat tactacttga 


atctgccatg 


2640 


aggttggaga 


acaaacttac 


attacttggt 


gctactggca ttgaagaccg tctgcaggag 


2700 


ggagtccctg 


aatctataga 


agctcttcac 


aaagcgggca tcaagatctg 


gatgctgaca 


2760 


ggggacaagc 


aggagacagc 


tgtcaacata 


gcttatgcat gcaaactact 


ggagccagat 


2820 


gacaagcttt 


ttatcctcaa 


tacccaaagt 


aaagtgcgta tattgagatt 


aaatctgttc 


2880 


ttctgtattt 


tcaaaggcat 


tggaacattt 


gagatttga 




2919 



<210> SEQ ID NO 2 

<211> LENGTH: 972 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 2 

Met Thr Glu Ala Leu Gin Trp Ala Arg Tyr His Trp Arg Arg Leu lie 
15 10 15 

Arg Gly Ala Thr Arg Asp Asp Asp Ser Gly Pro Tyr Asn Tyr Ser Ser 
20 25 30 

Leu Leu Ala Cys Gly Arg Lys Ser Ser Gin lie Pro Lys Leu Ser Gly 
35 40 45 

Arg His Arg lie Val Val Pro His lie Gin Pro Phe Lys Asp Glu Tyr 
50 55 60 

Glu Lys Phe Ser Gly Ala Tyr Val Asn Asn Arg lie Arg Thr Thr Lys 
65 70 75 80 

Tyr Thr Leu Leu Asn Phe Val Pro Arg Asn Leu Phe Glu Gin Phe His 
85 90 95 

Arg Ala Ala Asn Leu Tyr Phe Leu Phe Leu Val Val Leu Asn Trp Val 
100 105 110 

Pro Leu Val Glu Ala Phe Gin Lys Glu He Thr Met Leu Pro Leu Val 
115 120 125 

Val Val Leu Thr He He Ala He Lys Asp Gly Leu Glu Asp Tyr Arg 
130 135 140 

Lys Tyr Lys He Asp Lys Gin He Asn Asn Leu He Thr Lys Val Tyr 
145 150 155 160 

Ser Arg Lys Glu Lys Lys Tyr He Asp Arg Cys Trp Lys Asp Val Thr 
165 170 175 

Val Gly Asp Phe He Arg Leu Ser Cys Asn Glu Val He Pro Ala Asp 
180 185 190 

Met Val Leu Leu Phe Ser Thr Asp Pro Asp Gly He Cys His He Glu 
195 200 205 



Thr Ser 



Gly Leu Asp Gly Glu Ser Asn Leu 



Lys Gin Arg Gin Val Val 
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-continued 



210 



215 



220 



Arg Gly Tyr Ala Glu Gin Asp Ser Glu Val Asp Pro Glu Lye Phe Ser 
225 230 235 240 

Ser Arg lie Glu Cys Glu Ser Pro Asn Aen Asp Leu Ser Arg Phe Arg 
245 250 255 

Gly Phe Leu Glu His Ser Asn Lys Glu Arg Val Gly Leu Ser Lys Glu 
260 265 270 

Asn Leu Leu Leu Arg Gly Cys Thr lie Arg Asn Thr Glu Ala Val Val 
275 280 285 

Gly He Val Val Tyr Ala Gly His Glu Thr Lys Ala Met Leu Asn Asn 
290 295 300 

Ser Gly Pro Arg Tyr Lys Arg Ser Lys Leu Glu Arg Arg Ala Asn Thr 
305 310 315 320 

Asp Val Leu Trp Cys Val Met Leu Leu Val He Met Cys Leu Thr Gly 
325 330 335 

Ala Val Gly His Gly He Trp Leu Ser Arg Tyr Glu Lys Met His Phe 
340 345 350 

Phe Asn Val Pro Glu Pro Asp Gly His He He Ser Pro Leu Leu Ala 
355 360 365 

Gly Phe Tyr Met Phe Trp Thr Met He He Leu Leu Gin Val Leu He 
370 375 380 

Pro He Ser Leu Tyr Val Ser He Glu He Val Lys Leu Gly Gin He 
385 390 395 400 

Tyr Phe He Gin Ser Asp Val Asp Phe Tyr Asn Glu Lys Met Asp Ser 
405 410 415 

He Val Gin Cys Arg Ala Leu Asn He Ala Glu Asp Leu Gly Gin He 
420 425 430 

Gin Tyr Leu Phe Ser Asp Lys Thr Gly Thr Leu Thr Glu Asn Lys Met 
435 440 445 

Val Phe Arg Arg Cys Ser Val Ala Gly Phe Asp Tyr Cys His Glu Glu 
450 455 460 

Asn Ala Arg Arg Leu Glu Ser Tyr Gin Glu Ala Val Ser Glu Asp Glu 
465 470 475 480 

Asp Phe He Asp Thr Val Ser Gly Ser Leu Ser Asn Met Ala Lys Pro 
485 490 495 

Arg Ala Pro Ser Cys Arg Thr Val His Asn Gly Pro Leu Gly Asn Lys 
500 505 510 

Pro Ser Asn His Leu Ala Gly Ser Ser Phe Thr Leu Gly Ser Gly Glu 
515 520 525 

Gly Ala Ser Glu Val Pro His Ser Arg Gin Ala Ala Phe Ser Ser Pro 
530 535 540 

He Glu Thr Asp Val Val Pro Asp Thr Arg Leu Leu Asp Lys Phe Ser 
545 550 555 560 

Gin He Thr Pro Arg Leu Phe Met Pro Leu Asp Glu Thr He Gin Asn 
565 570 575 

Pro Pro Met Glu Thr Leu Tyr He He Asp Phe Phe He Ala Leu Ala 
580 585 590 

He Cys Asn Thr Val Val Val Ser Ala Pro Asn Gin Pro Arg Gin Lys 
595 600 605 

He Arg His Pro Ser Leu Gly Gly Leu Pro He Lys Ser Leu Glu Glu 
610 615 620 



He Lys Ser Leu Phe Gin Arg Trp Ser Val Arg Arg Ser Ser Ser Pro 
625 630 635 640 
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Ser Leu Asn Ser Gly Lys Glu Pro Ser Ser Gly Val Pro Asn Ala Phe 
645 650 655 

Val Ser Arg Leu Pro Leu Phe Ser Arg Met Lys Pro Ala Ser Pro Val 
660 665 670 

Glu Glu Glu Val Ser Gin Val Cys Glu Ser Pro Gin Cys Ser Ser Ser 
675 680 685 

Ser Ala Cys Cys Thr Glu Thr Glu Lys Gin His Gly Asp Ala Gly Leu 
690 695 700 

Leu Asn Gly Lys Ala Glu Ser Leu Pro Gly Gin Pro Leu Ala Cys Asn 
705 710 715 720 

Leu Cys Tyr Glu Ala Glu Ser Pro Asp Glu Ala Ala Leu Val Tyr Ala 
725 730 735 

Ala Arg Ala Tyr Gin Cys Thr Leu Arg Ser Arg Thr Pro Glu Gin Val 
740 745 750 

Met Val Asp Phe Ala Ala Leu Gly Pro Leu Thr Phe Gin Leu Leu His 
755 760 765 

He Leu Pro Phe Asp Ser Val Arg Lys Arg Met Ser Val Val Val Arg 
770 775 780 

His Pro Leu Ser Asn Gin Val Val Val Tyr Thr Lys Gly Ala Asp Ser 
785 790 795 800 

Val He Met Glu Leu Leu Ser Val Ala Ser Pro Asp Gly Ala Ser Leu 
805 810 815 

Glu Lys Gin Gin Met He Val Arg Glu Lys Thr Gin Lys His Leu Asp 
820 825 830 

Asp Tyr Ala Lys Gin Gly Leu Arg Thr Leu Cys He Ala Lys Lys Val 
835 840 B45 

Met Ser Asp Thr Glu Tyr Ala Glu Trp Leu Arg Asn His Phe Leu Ala 
850 855 860 

Glu Thr Ser He Asp Asn Arg Glu Glu Leu Leu Leu Glu Ser Ala Met 
865 870 875 880 

Arg Leu Glu Asn Lys Leu Thr Leu Leu Gly Ala Thr Gly He Glu Asp 
885 890 895 

Arg Leu Gin Glu Gly Val Pro Glu Ser He Glu Ala Leu His Lys Ala 
900 905 910 

Gly He Lys He Trp Met Leu Thr Gly Asp Lys Gin Glu Thr Ala Val 
915 920 925 

Asn He Ala Tyr Ala Cys Lys Leu Leu Glu Pro Asp Asp Lys Leu Phe 
930 935 940 

lie Leu Asn Thr Gin Ser Lys Val Arg He Leu Arg Leu Asn Leu Phe 
945 950 955 960 

Phe Cys He Phe Lys Gly He Gly Thr Phe Glu He 
965 970 



<210> SEQ ID NO 3 

<211> LENGTH: 375 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 3 

atgagtgaca ctgaatatgc agagtggctg aggaatcatt ttttagctga aaccagcatt 60 

gacaacaggg aagaattact acttgaatct gccatgaggt tggagaacaa acttacatta 120 

cttggtgcta ctggcattga agaccgtctg caggagggag tccctgaatc tatagaagct 180 

cttcacaaag cgggcatcaa gatctggatg ctgacagggg acaagcagga gacagctgtc 240 
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aacatagctt atgcatgcaa actactggag ccagatgaca agctttttat cctcaatacc 300 
caaagtaaag tgcgtatatt gagattaaat ctgttcttct gtattttcaa aggcattgga 360 
acatttgaga tttga 375 



<210> SEQ ID NO 4 

<211> LENGTH: 124 

<212> TYPE : PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 4 

Met Ser Asp Thr Glu Tyr Ala Glu Trp Leu Arg Asn His Phe Leu Ala 
15 10 15 

Glu Thr Ser He Asp Asn Arg Glu Glu Leu Leu Leu Glu Ser Ala Met 
20 25 30 

Arg Leu Glu Asn Lys Leu Thr Leu Leu Gly Ala Thr Gly He Glu Asp 
35 40 45 

Arg Leu Gin Glu Gly Val Pro Glu Ser He Glu Ala Leu His Lye Ala 
50 55 60 

Gly He Lys He Trp Met Leu Thr Gly Asp Lys Gin Glu Thr Ala Val 
65 70 75 80 

Asn He Ala Tyr Ala Cys Lys Leu Leu Glu Pro Asp Asp Lys Leu Phe 
85 90 95 

He Leu Asn Thr Gin Ser Lys Val Arg He Leu Arg Leu Asn Leu Phe 
100 105 110 

Phe Cys He Phe Lys Gly He Gly Thr Phe Glu He 
115 120 



<210> SEQ ID NO 5 

<211> LENGTH: 3171 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 5 

atgactgagg ctctccaatg ggccagatat cactggcgac ggctgatcag aggtgcaacc 60 

agggatgatg attcagggcc atacaactat tcctcgttgc tcgcctgtgg gcgcaagtcc 120 

tctcagatcc ctaaactgtc aggaaggcac cggattgttg ttccccacat ccagcccttc 180 

aaggatgagt atgagaagtt ctccggagcc tatgtgaaca atcgaatacg aacaacaaag 240 

tacacacttc tgaattttgt gccaagaaat ttatttgaac aatttcacag agctgccaat 300 

ttatatttcc tgttcctagt tgtcctgaac tgggtacctt tggtagaagc cttccaaaag 360 

gaaatcacca tgttgcctct ggtggtggtc cttacaatta tcgcaattaa agatggcctg 420 

gaagattatc ggaaatacaa aattgacaaa cagatcaata atttaataac taaagtttat 480 

agtaggaaag agaaaaaata cattgaccga tgctggaaag acgttactgt tggggacttt 540 

attcgcctct cctgcaacga ggtcatccct gcagacatgg tactactctt ttccactgat 600 

ccagatggaa tctgtcacat tgagacttct ggtcttgatg gagagagcaa tttaaaacag 660 

aggcaggtgg ttcggggata tgcagaacag gactctgaag ttgatcctga gaagttttcc 720 

agtaggatag aatgtgaaag cccaaacaat gacctcagca gattccgagg cttcctagaa 780 

cattccaaca aagaacgcgt gggtctcagt aaagaaaatt tgttgcttag aggatgcacc 840 

attagaaaca cagaggctgt tgtgggcatt gtggtttatg caggccatga aaccaaagca 900 

atgctgaaca acagtgggcc acggtataag cgcagcaaat tagaaagaag agcaaacaca 960 
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gatgtcctct 


ggtgtgtcat 


gcttctggtc 


ataatgtgct 


taactggcgc 


agtaggtcat 


1020 


ggaatctggc 


tgagcaggta 


tgaaaagatg 


cattttttca 


atgttcccga 


gcctgatgga 


1080 


catatcatat 


caccactgtt 


ggcaggattt 


tatatgtttt 


ggaccatgat 


cattttgtta 


1140 


caggtcttga 


ttcctatttc 


tctctatgtt 


tccatcgaaa 


ttgtgaagct 


tggacaaata 


1200 


tatttcattc 


aaagtgatgt 


ggatttctac 


aatgaaaaaa 


tggattctat 


tgttcagtgc 


1260 


cgagccctga 


acatcgccga 


ggatctggga 


cagattcagt 


acctcttttc 


cgataagaca 


1320 


ggaaccctca 


ctgagaataa 


gatggttttt 


cgaagatgta 


gtgtggcagg 


atttgattac 


1380 


tgccatgaag 


aaaatgccag 


gaggttggag 


tcctatcagg 


aagctgtctc 


tgaagatgaa 


1440 


gattttatag 


acacagtcag 


tggttccctc 


agcaatatgg 


caaaaccgag 


agcccccagc 


1500 


tgcaggacag 


ttcataatgg 


gcctttggga 


aataagccct 


caaatcatct 


tgctgggagc 


1560 


tcttttactc 


taggaagtgg 


agaaggagcc 


agtgaagtgc 


ctcattccag 


acaggctgct 


1620 


ttcagtagcc 


ccattgaaac 


agacgtggta 


ccagacacca 


ggcttttaga 


caaatttagt 


1680 


cagattacac 


ctcggctctt 


tatgccacta 


gatgagacca 


tccaaaatcc 


accaatggaa 


1740 


actttgtaca 


ttatcgactt 


tttcattgca 


ttggcaattt 


gcaacacagt 


agtggtttct 


1800 


gctcctaacc 


aaccccgaca 


aaagatcaga 


cacccttcac 


tgggggggtt 


gcccattaag 


1860 


tctttggaag 


agattaaaag 


tcttttccag 


agatggtctg 


tccgaagatc 


aagttctcca 


1920 


tcgcttaaca 


gtgggaaaga 


gccatcttct 


ggagttccaa 


acgcctttgt 


gagcagactc 


1980 


cctctcttta 


gtcgaatgaa 


accagcttca 


cctgtggagg 


aagaggtctc 


ccaggtgtgt 


2040 


gagagccccc 


agtgctccag 


tagctcagct 


tgctgcacag 


aaacagagaa 


acaacacggt 


2100 


gatgcaggcc 


tcctgaatgg 


caaggcagag 


tccctccctg 


gacagccatt 


ggcctgcaac 


2160 


ctgtgttatg 


aggccgagag 


cccagacgaa 


gcggccttag 


tgtatgccgc 


cagggcttac 


2220 


caatgcactt 


tacggtctcg 


gacaccagag 


caggtcatgg 


tggactttgc 


tgctttggga 


2280 


ccattaacat 


ttcaactcct 


acacatcctg 


ccctttgact 


cagtaagaaa 


aagaatgtct 


2340 


gttgtggtcc 


gacaccctct 


ttccaatcaa 


gttgtggtgt 


atacgaaagg 


cgctgattct 


2400 


gtgatcatgg 


agttactgtc 


ggtggcttcc 


ccagatggag 


caagtctgga 


gaaacaacag 


2460 


atgatagtaa 


gggagaaaac 


ccagaagcac 


ttggatgact 


atgccaaaca 


aggccttcgt 


2520 


actttatgta 


tagcaaagaa 


ggtcatgagt 


gacactgaat 


atgcagagtg 


gctgaggaat 


2580 


cattttttag 


ctgaaaccag 


cattgacaac 


agggaagaat 


tactacttga 


atctgccatg 


2640 


aggttggaga 


acaaacttac 


ottacttggt 


gctactggca 


ttgaagaccg 


tctgcaggag 


2700 


ggagtccctg 


aatctataga 


agctcttcac 


aaagcgggca 


tcaagatctg 


gatgctgaca 


2760 


ggggacaagc 


aggagacagc 


tgtcaacata 


gcttatgcat 


gcaaactact 


ggagccagat 


2820 


gacaagcttt 


ttatcctcaa 


tacccaaagt 


aaagatgcct 


gtgggatgct 


gatgagcaca 


2880 


attttgaaag 


aacttcagaa 


gaaaactcaa 


gccctgccag 


agcaagtgtc 


attaagtgaa 


2940 


gatttacttc 


agcctcctgt 


cccccgggac 


tcagggttac 


gagctggact 


cattatcact 


3000 



gggaagaccc tggagtttgc cctgcaagaa agtctgcaaa agcagttcct ggaactgaca 3060 
tcttggtgtc aagctgtggt ctgctgccga gccacaccgc tgcagaaaag tgaagtggtg 3120 
aaattggtcc gcagccatct ccaggtgatg acccttgcta ttggtgagtg a 3171 



<210> SEQ ID NO 6 

<211> LENGTH: 1056 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 
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<400> SEQUENCE: 6 

Met Thr Glu Ala Leu Gin Trp Ala Arg Tyr His Trp Arg Arg Leu He 
15 10 15 

Arg Gly Ala Thr Arg Asp Asp Asp Ser Gly Pro Tyr Asn Tyr Ser Ser 
20 25 30 

Leu Leu Ala Cys Gly Arg Lys Ser Ser Gin He Pro Lys Leu Ser Gly 
35 ,40 45 

Arg His Arg He Val Val Pro His He Gin Pro Phe Lys Asp Glu Tyr 
50 55 60 

Glu Lys Phe Ser Gly Ala Tyr Val Asn Asn Arg He Arg Thr Thr Lys 
65 70 75 80 

Tyr Thr Leu Leu Asn Phe Val Pro Arg Asn Leu Phe Glu Gin Phe His 
85 90 95 

Arg Ala Ala Asn Leu Tyr Phe Leu Phe Leu Val Val Leu Aen Trp Val 
100 105 110 

Pro Leu Val Glu Ala Phe Gin Lys Glu He Thr Met Leu Pro Leu Val 
115 120 125 

Val Val Leu Thr He He Ala He Lys Asp Gly Leu Glu Asp Tyr Arg 
130 135 140 

Lys Tyr Lys He Asp Lys Gin He Asn Asn Leu He Thr Lys Val Tyr 
145 150 155 160 

Ser Arg Lys Glu Lys Lys Tyr He Asp Arg Cys Trp Lys Asp Val Thr 
165 170 175 

Val Gly Asp Phe He Arg Leu Ser Cys Asn Glu Val He Pro Ala Asp 
180 185 190 

Met Val Leu Leu Phe Ser Thr Asp Pro Asp Gly He Cys His He Glu 
195 200 205 

Thr Ser Gly Leu Asp Gly Glu Ser Asn Leu Lys Gin Arg Gin Val Val 
210 215 220 

Arg Gly Tyr Ala Glu Gin Asp Ser Glu Val Asp Pro Glu Lys Phe Ser 
225 230 235 240 

Ser Arg He Glu Cys Glu Ser Pro Asn Asn Asp Leu Ser Arg Phe Arg 
245 250 255 

Gly Phe Leu Glu His Ser Asn Lys Glu Arg Val Gly Leu Ser Lys Glu 
260 265 270 

Asn Leu Leu Leu Arg Gly Cys Thr He Arg Asn Thr Glu Ala Val Val 
275 280 285 

Gly He Val Val Tyr Ala Gly His Glu Thr Lys Ala Met Leu Asn Asn 
290 295 300 

Ser Gly Pro Arg Tyr Lys Arg Ser Lys Leu Glu Arg Arg Ala Asn Thr 
305 310 315 320 

Asp Val Leu Trp Cys Val Met Leu Leu Val lie Met Cys Leu Thr Gly 
325 330 335 

Ala Val Gly His Gly He Trp Leu Ser Arg Tyr Glu Lys Met His Phe 
340 345 350 

Phe Asn Val Pro Glu Pro Asp Gly His He He Ser Pro Leu Leu Ala 
355 . 360 365 

Gly Phe Tyr Met Phe Trp Thr Met He He Leu Leu Gin Val Leu He 
370 375 ' 380 

Pro He Ser Leu Tyr Val Ser He Glu He Val Lys Leu Gly Gin He 
385 390 395 400 



Tyr Phe He Gin Ser Asp Val Asp Phe Tyr Asn Glu Lys Met Asp Ser 
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405 



410 



415 



lie Val Gin Cys Arg Ala Leu Asn lie Ala Glu Asp Leu Gly Gin lie 
420 425 430 

Gin Tyr Leu Phe Ser Asp Lys Thr Gly Thr Leu Thr Glu Asn Lys Met 
435 440 445 

Val Phe Arg Arg Cys Ser Val Ala Gly Phe Asp Tyr Cys His Glu Glu 
450 455 460 

Asn Ala Arg Arg Leu Glu Ser Tyr Gin Glu Ala Val Ser Glu Asp Glu 



Arg Ala Pro Ser Cys Arg Thr Val His Asn Gly Pro Leu Gly Asn Lys 
500 505 510 

Pro Ser Asn His Leu Ala Gly Ser Ser Phe Thr Leu Gly Ser Gly Glu 
515 520 525 

Gly Ala Ser Glu Val Pro His Ser Arg Gin Ala Ala Phe Ser Ser Pro 
530 535 540 

lie Glu Thr Asp Val Val Pro Asp Thr Arg Leu Leu Asp Lys Phe Ser 
545 550 555 560 

Gin lie Thr Pro Arg Leu Phe Met Pro Leu Asp Glu Thr lie Gin Asn 
565 570 575 

Pro Pro Met Glu Thr Leu Tyr He He Asp Phe Phe He Ala Leu Ala 
580 585 590 

He Cys Asn Thr Val Val Val Ser Ala Pro Asn Gin Pro Arg Gin Lys 
595 600 605 

He Arg His Pro Ser Leu Gly Gly Leu Pro He Lys Ser Leu Glu Glu 
610 615 620 

He Lys Ser Leu Phe Gin Arg Trp Ser Val Arg Arg Ser Ser Ser Pro 
625 630 635 640 

Ser Leu. Asn Ser Gly Lys Glu Pro Ser Ser Gly Val Pro Asn Ala Phe 
645 650 655 

Val Ser Arg Leu Pro Leu Phe Ser Arg Met Lys Pro Ala Ser Pro Val 
660 665 670 

Glu Glu Glu Val Ser Gin Val Cys Glu Ser Pro Gin Cys Ser Ser Ser 
675 680 685 

Ser Ala Cys Cys Thr Glu Thr Glu Lys Gin His Gly Asp Ala Gly Leu 
690 695 700 

Leu Asn Gly Lys Ala Glu Ser Leu Pro Gly Gin Pro Leu Ala Cys Asn 
705 710 715 720 

Leu Cys Tyr Glu Ala Glu Ser Pro Asp Glu Ala Ala Leu Val Tyr Ala 
725 730 735 

Ala Arg Ala Tyr Gin Cys Thr Leu Arg Ser Arg Thr Pro Glu Gin Val 
740 745 750 

Met Val Asp Phe Ala Ala Leu Gly Pro Leu Thr Phe Gin Leu Leu His 
755 760 765 

He Leu Pro Phe Asp Ser Val Arg Lys Arg Met Ser Val Val Val Arg 
770 775 780 

His Pro Leu Ser Asn Gin Val Val Val Tyr Thr Lys Gly Ala Asp Ser 
785 790 795 800 

Val He Met Glu Leu Leu Ser Val Ala Ser Pro Asp Gly Ala Ser Leu 



4 65 



470 



475 



480 



Asp Phe He Asp Thr Val Ser Gly Ser Leu Ser Asn Met Ala Lys Pro 
485 490 495 



805 



810 



815 



Glu Lys Gin Gin Met 
820 



He Val Arg Glu Lys Thr Gin 
825 



Lys His Leu Asp 
830 
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Asp Tyr Ala Lys Gin Gly Leu Arg Thr Leu Cys lie Ala Lys Lye Val 
835 840 845 

Met Ser Asp Thr Glu Tyr Ala Glu Trp Leu Arg Asn His Phe Leu Ala 
850 855 860 

Glu Thr Ser He Asp Asn Arg Glu Glu Leu Leu Leu Glu Ser Ala Met 
865 870 875 880 

Arg Leu Glu Asn Lys Leu Thr Leu Leu Gly Ala Thr Gly He Glu Asp 
885 890 895 

Arg Leu Gin Glu Gly Val Pro Glu Ser He Glu Ala Leu His Lys Ala 
900 905 910 

Gly He Lys He Trp Met Leu Thr Gly Asp Lys Gin Glu Thr Ala Val 
915 920 925 

Asn He Ala Tyr Ala Cys Lys Leu Leu Glu Pro Asp Asp Lys Leu Phe 
930 935 940 

He Leu Asn Thr Gin Ser Lys Asp Ala Cys Gly Met Leu Met Ser Thr 
945 950 955 960 

He Leu Lys Glu Leu Gin Lys Lys Thr Gin Ala Leu Pro Glu Gin Val 
965 970 975 

Ser Leu Ser Glu Asp Leu Leu Gin Pro Pro Val Pro Arg Asp Ser Gly 
980 985 990 

Leu Arg Ala Gly Leu He He Thr Gly Lys Thr Leu Glu Phe Ala Leu 
995 1000 ■ 1005 

Gin Glu Ser Leu Gin Lys Gin Phe Leu Glu Leu Thr Ser Trp Cys Gin 
1010 1015 1020 

Ala Val Val Cys Cys Arg Ala Thr Pro Leu Gin Lys Ser Glu Val Val 
1025 1030 1035 1040 

Lys Leu Val Arg Ser His Leu Gin Val Met Thr Leu Ala He Gly Glu 
1045 1050 1055 



<210> SEQ ID NO 7 

<211> LENGTH: 627 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 7 

atgagtgaca ctgaatatgc agagtggctg aggaatcatt ttttagctga aaccagcatt 60 

gacaacaggg aagaattact acttgaatct gccatgaggt tggagaacaa octtocatta 120 

cttggtgcta ctggcattga agaccgtctg caggagggag tccctgaatc tatagaagct 180 

cttcacaaeg cgggcatcaa gatctggatg ctgacagggg acaagcagga gacagctgtc 240 

aacatagctt atgcatgcaa actactggag ccagatgaca agctttttat cctcaatacc 300 

caaagtaaag atgcctgtgg gatgctgatg agcacaattt tgaaagaact tcagaagaaa 360 

actcaagccc tgccagagca agtgtcatta agtgaagatt tacttcagcc tcctgtcccc 420 

cgggactcag ggttacgagc tggactcatt atcactggga agaccctgga gtttgccctg 480 

caagaaagtc tgcaaaagca gttcctggaa ctgacatctt ggtgtcaagc tgtggtctgc 54 0 

tgccgagcca caccgctgca gaaaagtgaa gtggtgaaat tggtccgcag ccatctccag 600 

gtgatgaccc ttgctattgg tgagtga 627 



<210> SEQ ID NO 8 

<211> LENGTH: 208 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 
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<400> SEQUENCE: 8 

Met Ser Asp Thr Glu Tyr Ala Glu Trp Leu Arg Asn His Phe Leu Ala 
1 5 10 15 

Glu Thr Ser lie Asp Asn Arg Glu Glu Leu Leu Leu Glu Ser Ala Met 
20 25 30 

Arg Leu Glu Asn Lys Leu Thr Leu Leu Gly Ala Thr Gly lie Glu Asp 
35 40 45 

Arg Leu Gin Glu Gly Val Pro Glu Ser lie Glu Ala Leu His Lys Ala 
50 55 60 

Gly He Lys He Trp Met Leu Thr Gly Asp Lys Gin Glu Thr Ala Val 
65 70 75 80 

Asn He Ala Tyr Ala Cys Lys Leu Leu Glu Pro Asp Asp Lys Leu Phe 
85 90 95 

He Leu Asn Thr Gin Ser Lys Asp Ala Cys Gly Met Leu Met Ser Thr 
100 105 110 

He Leu Lys Glu Leu Gin Lys Lys Thr Gin Ala Leu Pro Glu Gin Val 
115 120 125 

Ser Leu Ser Glu Asp Leu Leu Gin Pro Pro Val Pro Arg Asp Ser Gly 
130 135 140 

Leu Arg Ala Gly Leu He He Thr Gly Lys Thr Leu Glu Phe Ala Leu 
145 150 155 160 

Gin Glu Ser Leu Gin Lys Gin Phe Leu Glu Leu Thr Ser Trp Cys Gin 
165 170 175 

Ala Val Val Cys Cys Arg Ala Thr Pro Leu Gin Lys Ser Glu Val Val 
180 185 190 

Lys Leu Val Arg Ser His Leu Gin Val Met Thr Leu Ala He Gly Glu 



<210> SEQ ID NO 9 

<211> LENGTH: 3813 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 9 



atgactgagg 


ctctccaatg 


ggccagatat 


cactggcgac 


ggctgatcag 


aggtgcaacc 


60 


agggatgatg 


attcagggcc 


atacaactat 


tcctcgttgc 


tcgcctgtgg 


gcgcaagtcc 


120 


tctcagatcc 


ctaaactgtc 


aggaaggcac 


cggattgttg 


ttccccacat 


ccagcccttc 


180 


aaggatgagt 


atgagaagtt 


ctccggagcc 


tatgtgaaca 


atcgaatacg 


aacaacaaag 


240 


tacacacttc 


tgaattttgt 


gccaagaaat 


ttatttgaac 


aatttcacag 


agctgccaat 


300 


ttatatttcc 


tgttcctagt 


tgtcctgaac 


tgggtacctt 


tggtagaagc 


cttccaaaag 


360 


gaaatcacca 


tgttgcctct 


ggtggtggtc 


cttacaatta 


tcgcaattaa 


agatggcctg 


420 


gaagattatc 


ggaaatacaa 


aattgacaaa 


cagatcaata 


atttaataac 


taaagtttat 


480 


agtaggaaag 


agaaaaaata 


cattgaccga 


tgctggaaag 


acgttactgt 


tggggacttt 


540 


attcgcctct 


cctgcaacga 


ggtcatccct 


gcagacatgg 


tactactctt 


ttccactgat 


600 


ccagatggaa 


tctgtcacat 


tgagacttct 


ggtcttgatg 


gagagagcaa 


tttaaaacag 


660 


aggcaggtgg 


ttcggggata 


tgcagaacag 


gactctgaag 


ttgatcctga 


gaagttttcc 


720 


agtaggatag 


aatgtgaaag 


cccaaacaat 


gacctcagca 


gattccgagg 


cttcctagaa 


780 


cattccaaca 


aagaacgcgt 


gggtctcagt 


aaagaaaatt 


tgttgcttag 


aggatgcacc 


840 


attagaaaca 


cagaggctgt 


tgtgggcatt 


gtggtttatg 


caggccatga 


aaccaaagca 


900 



195 



200 



205 
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atgctgaaca 


acagtgggcc 


acggtataag 


cgcagcaaat 


tagaaagaag 


agcaaacaca 


960 


gatgtcctct 




gcttctggtc 


ataatgtgct 


taactggcgc 


agtaggtcat 


1020 


ggaatctggc 


tgagcaggta 


tgaaaagatg 


cattttttca 


atgttcccga 


qcctqatgqa 

3 3 3 3 


1080 


catatcatat 


caccactgtt 


gqcaqgattt 


tatatgtttt 


ggaccatgat 


cattttgtta 


1140 


caggtcttga 


ttcctatttc 


tctctatgtt 


tccatcgaaa 


ttgtgaagct 


tggacaaata 


1200 


tatttcattc 


aaagtgatgt 


ggatttctac 


aatgaaaaaa 


tggattctat 


tgttcagtgc 


1260 


cgagccctga 


acatcgccga 


qqatctqqqa 

3 3 3 33 


cagattcagt 


acctcttttc 


cgataagaca 


1320 


ggaaccctca 


ctgagaataa 


gatggttttt 


cgaagatgta 


atataacaaa 


atttgattac 


1380 


tgccatgaag 


aaaatgccag 


qaqqttqqaa 

3 3 3 J 3 3 


tcctatcagg 


aagctgtctc 


tgaagatgaa 


1440 


gattttatag 


acacagtcag 


tggttccctc 


agcaatatgg 


caaaaccgag 


agcccccagc 


1500 


tqcaqqacaq 


ttcataatgg 


acctttaaaa 


aataagccct 


caaatcatct 


tactqqqaac 

1 *3 V ' **3 3 3 a 5 w 


1560 


tcttttactc 


taqqaaqtqq 

J J J w 3 3 


aqaaqqaqcc 


aqtqaaqtqc 

3 3 3 3 


ctcattccag 


acaggctget 


1620 


ttcagtagcc 


ccattgaaac 


aaacatcrota 

uvjw^^ k 33 


ccagacacca 


ggcttttaga 


caaatttagt 


1680 


cagattacac 


ctcggctctt 


tatgccacta 


gat gaga cca 


tccaaaatcc 


accaatggaa 


1740 


actttgtaca 


ttatcgactt 


tttcattgca 


ttggcaattt 


gcaacacagt 


agtggtttct 


1800 


gctcctaacc 


aaccccgaca 


aaagatcaga 


cacccttcac 


''333 3333*' u 


geccattaag 


1860 


tctttggaag 


agattaaaag 


tcttttccag 


aaataatcta 


tec ga agate 


aagttctcca 


1920 


tcgcttaaca 


otqciqaaaqa 


gccatcttct 


ggagttccaa 


aegectttgt 


gagcagactc 


1980 


cctctcttta 


gtcgaatgaa 


accagcttca 


cctqtqqaqq 

3 33 33 


aagaggtctc 


ccaqqtqtqt 

****"33 3 3 •» 


2040 


gagagccccc 


agtgctccag 


tagctcagct 


tgctgcacag 


aaacagagaa 


acaacac ggt 


2100 


gatgcaggcc 


tcctgaatgg 


caaqqcaqaq 

33 3 3 


tccctccctg 


gaeagecatt 


ggcctgcaac 


2160 


ctgtgttatg 


aqgccqaqaq 


cccagacgaa 


qcqqccttaq 

J J 3 w ^ ^ 3 


tgtat geege 


cagggcttac 


2220 


caatgcactt 


tacggtctcg 


gacaccagag 


caqqtcatqq 


tggactttgc 


tqctttaqqa 

w 3 w w 3 33 61 


2280 


ccattaacat 


ttcaactcct 


acacatcctg 


ccctttgact 


cagtaagaaa 


aagaatgtct 


2340 


gttqtqqtcc 


gacaccctct 


ttccaatcaa 


qttcrtctatat 


atacgaaagg 


egctgattet 


2400 


qtqatcatqq 


agttactgtc 


qqtaacttcc 

33 , *33 www% "*' 


ccaaataaaa 

3 3 3 3 


caagtctgga 


gaaacaacag 


2460 


atgatagtaa 


qqqaqaaaac 


ccagaagcac 


ttggatgact 


atgecaaaca 


aggecttegt 


2520 


actttatgta 


tagcaaagaa 


crcrtcataaat 

33 3 w 


gacactgaat 


atgcagagtg 


gctgaggaat 


2580 


cattttttag 


ctgaaaccag 


cattgacaac 


agggaagaat 


tactacttga 


atetgecatg 


2640 


aqqttqqaqa 


acaaacttac 


attacttggt 


gctactggca 


ttgaagaccg 


tctocaoaaa 

^ 3 W 3 3 " 3 


2700 


qqaqtccctq 
:? 3 3 j 


aatctataga 


agctcttcac 


aaagcgggca 


tcaagatctg 


gatgetgaca 


2760 


qqqqacaaqc 


aggagacagc 


tgtcaacata 


get tat gc at 


gcaaactact 


ggagecagat 


2820 


gacaagcttt 


ttatcctcaa 


tacccaaagt 


aaagatgect 


gtgggatgct 


gatgagcaca 


■ 2880 


attttgaaag 


aacttcagaa 


gaaaactcaa 


gccctgccag 


agcaagtgtc 


attaagtgaa 


2940 


gatttacttc 


agcctcctgt 


cccccgggac 


tcagggttac 


gagctggact 


cattatcact 


3000 


gggaagaccc 


tggagtttgc 


cctgcaagaa 


agtctgcaaa 


agcagttcct 


ggaactgaca 


3060 


tcttggtgtc 


aagctgtggt 


ctgctgccga 


gccncaccgc 


tgcagaaaag 


tgaagtggtg 


3120 


aaattggtcc 


gcagccatct 


ccaggtgatg 


acccttgcta 


ttggtgatgg 


tgecaatgat 


3180 


gttagcatga 


tacaagtggc 


agacattggg 


ataggggtct 


caggtcaaga 


aggcatgeag 


3240 


gctgtgatgg 


ccagtgactt 


tgccgtttct 


cagttcaaac 


atctcagcaa 


gctccttctt 


3300 
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gtccatggac actggtgtta 


tacacggctt 


tccaacatga 


ttctctattt tttctataag 


3360 


aatgtggcct atgtgaacct 


ccttttctgg 


taccagttct tttgtggatt ttcaggaaca 


3420 


tccatgactg attactgggt 


tttgatcttc 


ttcaacctcc 


tcttcacatc tgcccctcct 


3480 


gtcatttatg gtgttttgga 


gaaagatgtg 


tctgcagaga 


ccctcatgca actgcctgaa 


3540 


ctttacagaa gtggtcagaa 


atcagaggca 


tacttacccc 


ataccttctg gatcacctta 


3600 


ttggatgctt tttatcaaag 


cctggtctgc 


ttctttgtgc 


cttattttac ctaccagggc 


3660 


tcagatactg acatctttgc 


atttggaaac 


cccctgaaca 


cagccactct gttcatcgtt 


3720 


ctcctccatc tggtcattga 


aagcaagagt 


ttgaccaggt 


gcagtgactc acacctgcaa 


3780 


ttccagagct ttgggaggct 


gtggatcaca 


tga 




3813 



<210> SEQ ID NO 10 

<211> LENGTH: 1270 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 10 

Met Thr Glu Ala Leu Gin Trp Ala Arg Tyr Hie Trp Arg Arg Leu lie 
15 10 15 

Arg Gly Ala Thr Arg Asp Asp Asp Ser Gly Pro Tyr Asn Tyr Ser Ser 
20 25 30 

Leu Leu Ala Cys Gly Arg Lys Ser Ser Gin lie Pro Lys Leu Ser Gly 
35 40 45 

Arg His Arg He Val Val Pro His He Gin Pro Phe Lys Asp Glu Tyr 
50 55 60 

Glu Lys Phe Ser Gly Ala Tyr Val Asn Asn Arg He Arg Thr Thr Lys 
65 70 75 80 

Tyr Thr Leu Leu Asn Phe Val Pro Arg Asn Leu Phe Glu Gin Phe His 
85 90 95 

Arg Ala Ala Asn Leu Tyr Phe Leu Phe Leu Val Val Leu Asn Trp Val 
100 105 110 

Pro Leu Val Glu Ala Phe Gin Lys Glu He Thr Met Leu Pro Leu Val 
115 120 125 

Val Val Leu Thr He He Ala He Lys Asp Gly Leu Glu Asp Tyr Arg 
130 135 140 

Lys Tyr Lys He Asp Lys Gin He Asn Asn Leu He Thr Lys Val Tyr 
145 150 155 160 

Ser Arg Lys Glu Lys Lys Tyr He Asp Arg Cys Trp Lys Asp Val Thr 
165 170 175 

Val Gly Asp Phe He Arg Leu Ser Cys Asn Glu Val He Pro Ala Asp 
180 185 190 

Met Val Leu Leu Phe Ser Thr Asp Pro Asp Gly He Cys His He Glu 
195 200 205 

Thr Ser Gly Leu Asp Gly Glu Ser Asn Leu Lys Gin Arg Gin Val Val 
210 215 220 

Arg Gly Tyr Ala Glu Gin Asp Ser Glu Val Asp Pro Glu Lys Phe Ser 
225 230 235 240 

Ser Arg He Glu Cys Glu Ser Pro Asn Asn Asp Leu Ser Arg Phe Arg 
245 250 255 

Gly Phe Leu Glu His Ser Asn Lys Glu Arg Val Gly Leu Ser Lys Glu 
260 265 270 

Asn Leu Leu Leu Arg Gly CyB Thr He Arg Asn Thr Glu Ala Val Val 
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275 



280 



285 



Gly He Val Val Tyr Ala Gly Hie Glu Thr Lys Ala Met Leu Asn Asn 
290 295 300 

Ser Gly Pro Arg Tyr Lye Arg Ser Lys Leu Glu Arg Arg Ala Asn Thr 
305 310 315 320 

Asp Val Leu Trp Cys Val Met Leu Leu Val He Met Cys Leu Thr Gly 
325 330 335 

Ala Val Gly His Gly He Trp Leu Ser Arg Tyr Glu Lys Met His Phe 
340 345 350 

Phe Asn Val Pro Glu Pro Asp Gly His He He Ser Pro Leu Leu Ala 
355 360 365 

Gly Phe Tyr Met Phe Trp Thr Met He He Leu Leu Gin Val Leu He 
370 375 380 

Pro He Ser Leu Tyr Val Ser He Glu He Val Lys Leu Gly Gin lie 
385 390 395 400 

Tyr Phe He Gin Ser Asp Val Asp Phe Tyr Asn Glu Lys Met Asp Ser 
405 410 415 

He Val Gin Cys Arg Ala Leu Asn He Ala Glu Asp Leu Gly Gin He 
420 425 430 

Gin Tyr Leu Phe Ser Asp Lys Thr Gly Thr Leu Thr Glu Asn Lys Met 
435 440 445 

Val Phe Arg Arg Cys Ser Val Ala Gly Phe Asp Tyr Cys His Glu Glu 
450 455 460 

Asn Ala Arg Arg Leu Glu Ser Tyr Gin Glu Ala Val Ser Glu Asp Glu 
465 470 475 480 

Asp Phe He Asp Thr Val Ser Gly Ser Leu Ser Asn Met Ala Lys Pro 
485 490 495 

Arg Ala Pro Ser Cys Arg Thr Val His Asn Gly Pro Leu Gly Asn Lys 
500 505 510 

Pro Ser Asn His Leu Ala Gly Ser Ser Phe Thr Leu Gly Ser Gly Glu 
515 520 525 

Gly Ala Ser Glu Val Pro His Ser Arg Gin Ala Ala Phe Ser Ser Pro 
530 535 540 

He Glu Thr Asp Val Val Pro Asp Thr Arg Leu Leu Asp Lys Phe Ser 
545 550 555 560 

Gin lie Thr Pro Arg Leu Phe Met Pro Leu Asp Glu Thr He Gin Asn 
565 570 575 

Pro Pro Met Glu Thr Leu Tyr He He Asp Phe Phe He Ala Leu Ala 
580 585 590 

He Cys Asn Thr Val Val Val Ser Ala Pro Asn Gin Pro Arg Gin Lys 
595 600 605 

He Arg His Pro Ser Leu Gly Gly Leu Pro lie Lys Ser Leu Glu Glu 
610 615 620 

He Lys Ser Leu Phe Gin Arg Trp Ser Val Arg Arg Ser Ser Ser Pro 
625 630 635 640 

Ser Leu Asn Ser Gly Lys Glu Pro Ser Ser Gly Val Pro Asn Ala Phe 
645 650 655 

Val Ser Arg Leu Pro Leu Phe Ser Arg Met Lys Pro Ala Ser Pro Val 
660 665 670 

Glu Glu Glu Val Ser Gin Val Cys Glu Ser Pro Gin Cys Ser Ser Ser 
675 680 685 

Ser Ala Cys Cys Thr Glu Thr Glu Lys Gin His Gly Asp Ala Gly Leu 
690 695 700 
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Leu Asn Gly Lys Ala Glu Ser Leu Pro Gly Gin Pro Leu Ala Cys Asn 
705 710 715 720 

Leu Cys Tyr Glu Ala Glu Ser Pro Asp Glu Ala Ala Leu Val Tyr Ala 
725 730 735 

Ala Arg Ala Tyr Gin Cys Thr Leu Arg Ser Arg Thr Pro Glu Gin Val 
740 745 750 

Met Val Asp Phe Ala Ala Leu Gly Pro Leu Thr Phe Gin Leu Leu His 
755 760 765 

He Leu Pro Phe Asp Ser Val Arg Lys Arg Met Ser Val Val Val Arg 
770 775 780 

His Pro Leu Ser Asn Gin Val Val Val Tyr Thr LyB Gly Ala Asp Ser 
785 790 795 800 

Val He Met Glu Leu Leu Ser Val Ala Ser Pro Asp Gly Ala Ser Leu 
805 810 815 

Glu Lys Gin Gin Met He Val Arg Glu Lys Thr Gin Lys His Leu Asp 
820 825 830 

Asp Tyr Ala Lys Gin Gly Leu Arg Thr Leu Cys He Ala Lys Lys Val 
835 840 845 

Met Ser Asp Thr Glu Tyr Ala Glu Trp Leu Arg Asn His Phe Leu Ala 
850 855 860 

Glu Thr Ser He Asp Asn Arg Glu Glu Leu Leu Leu Glu Ser Ala Met 
865 870 875 880 

Arg Leu Glu Asn Lys Leu Thr Leu Leu Gly Ala Thr Gly He Glu Asp 
885 890 895 

Arg Leu Gin Glu Gly Val Pro Glu Ser He Glu Ala Leu His Lys Ala 
900 905 910 

Gly He Lys He Trp Met Leu Thr Gly Asp Lys Gin Glu Thr Ala Val 
915 920 925 

Asn He Ala Tyr Ala Cys Lys Leu Leu Glu Pro Asp Asp Lys Leu Phe 
930 935 940 

He Leu Asn Thr Gin Ser Lys Asp Ala Cys Gly Met Leu Met Ser Thr 
945 950 955 960 

He Leu Lys Glu Leu Gin Lys Lys Thr Gin Ala Leu Pro Glu Gin Val 
965 970 975 

Ser Leu Ser Glu Asp Leu Leu Gin Pro Pro Val Pro Arg Asp Ser Gly 
980 985 990 

Leu Arg Ala Gly Leu He He Thr Gly Lys Thr Leu Glu Phe Ala Leu 
995 1000 1005 

Gin Glu Ser Leu Gin Lys Gin Phe Leu Glu Leu Thr Ser Trp Cys Gin 
1010 1015 1020 

Ala Val Val Cys Cys Arg Ala Thr Pro Leu Gin Lys Ser Glu Val Val 
1025 1030 1035 1040 

Lys Leu Val Arg Ser His Leu Gin Val Met Thr Leu Ala He Gly Asp 
1045 1050 1055 

Gly Ala Asn Asp Val Ser Met He Gin Val Ala Asp lie Gly He Gly 
1060 1065 1070 

Val Ser Gly Gin Glu Gly Met Gin Ala Val Met Ala Ser Asp Phe Ala 
1075 1080 1085 

Val Ser Gin Phe Lys His Leu Ser Lys Leu Leu Leu Val His Gly His 
1090 1095 1100 

Trp Cys Tyr Thr Arg Leu Ser Asn Met He Leu Tyr Phe Phe Tyr Lys 
1105 1110 1115 1120 
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Asn Val Ala Tyr Val Asn Leu Leu Phe Trp Tyr Gin Phe Phe Cys Gly 
1125 1130 1135 

Phe Ser Gly Thr Ser Met Thr Asp Tyr Trp Val Leu lie Phe Phe Asn 
1140 1145 1150 

Leu Leu Phe Thr Ser Ala Pro Pro Val lie Tyr Gly Val Leu Glu Lys 
1155 1160 1165 

Asp Val Ser Ala Glu Thr Leu Met Gin Leu Pro Glu Leu Tyr Arg Ser 
1170 1175 1180 

Gly Gin Lys Ser Glu Ala Tyr Leu Pro His Thr Phe Trp lie Thr Leu 
1185 1190 1195 1200 

Leu Asp Ala Phe Tyr Gin Ser Leu Val Cys Phe Phe Val Pro Tyr Phe 
1205 1210 1215 

Thr Tyr Gin Gly Ser Asp Thr Asp lie Phe Ala Phe Gly Asn Pro Leu 
1220 1225 1230 

Asn Thr Ala Thr Leu Phe lie Val Leu Leu His Leu Val lie Glu Ser 
1235 1240 1245 

Lys Ser Leu Thr Arg Cys Ser Asp Ser His Leu Gin Phe Gin Ser Phe 
1250 1255 1260 

Gly Arg Leu Trp lie Thr 
1265 1270 



<210> SEQ ID NO 11 

<211> LENGTH: 1269 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 11 

atgagtgaca ctgaatatgc agagtggctg aggaatcatt ttttagctga aaccagcatt 60 

gacaacaggg aagaattact acttgaatct gccatgaggt tggagaacaa acttacatta 120 

cttggtgcta ctggcattga agaccgtctg caggagggag tccctgaatc tatagaagct 180 

cttcacaaag cgggcatcaa gatctggatg ctgacagggg acaagcagga gacagctgtc 240 

aacatagctt atgcatgcaa actactggag ccagatgaca agctttttat cctcaatacc 300 

caaagtaaag atgcctgtgg gatgctgatg agcacaattt tgaaagaact tcagaagaaa 360 

actcaagccc tgccagagca agtgtcatta agtgaagatt tacttcagcc tcctgtcccc 420 

cgggactcag ggttacgagc tggactcatt atcactggga agaccctgga gtttgccctg 480 

caagaaagtc tgcaaaagca gttcctggaa ctgacatctt ggtgtcaagc tgtggtctgc 54 0 

tgccgagcca caccgctgca gaaaagtgaa gtggtgaaat tggtccgcag ccatctccag 600 

gtgatgaccc ttgctattgg tgatggtgcc aatgatgtta gcatgataca agtggcagac 660 

attgggatag gggtctcagg tcaagaaggc atgcaggctg tgatggccag tgactttgcc 720 

gtttctcagt tcaaacatct cagcaagctc cttcttgtcc atggacactg gtgttataca 780 

cggctttcca acatgattct ctattttttc tataagaatg tggcctatgt gaacctcctt 840 

ttctggtacc agttcttttg tggattttca ggaacatcca tgactgatta ctgggttttg 900 

atcttcttca acctcctctt cacatctgcc cctcctgtca tttatggtgt tttggagaaa 960 

gatgtgtctg cagagaccct catgcaactg cctgaacttt acagaagtgg tcagaaatca 1020 

gaggcatact taccccatac cttctggatc accttattgg atgcttttta tcaaagcctg 1080 

gtctgcttct ttgtgcctta ttttacctac cagggctcag atactgacat ctttgcattt 1140 

ggaaaccccc tgaacacagc cactctgttc atcgttctcc tccatctggt cattgaaagc 1200 

aagagtttga ccaggtgcag tgactcacac ctgcaattcc agagctttgg gaggctgtgg 1260 
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atcacatga 



1269 



<210> SEQ ID NO 12 

<211> LENGTH: 422 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 12 

Met Ser Asp Thr Glu Tyr Ala Glu Trp Leu Arg Asn His Phe Leu Ala 
15 10 15 

Glu Thr Ser He Asp Asn Arg Glu Glu Leu Leu Leu Glu Ser Ala Met 
20 25 30 

Arg Leu Glu Asn Lys Leu Thr Leu Leu Gly Ala Thr Gly He Glu Asp 
35 40 45 

Arg Leu Gin Glu Gly Val Pro Glu Ser He Glu Ala Leu His Lys Ala 
50 55 60 

Gly He Lys He Trp Met Leu Thr Gly Asp Lys Gin Glu Thr Ala Val 
65 70 75 80 

Asn He Ala Tyr Ala Cys Lys Leu Leu Glu Pro Asp Asp Lys Leu Phe 
85 90 95 

He Leu Asn Thr Gin Ser Lys Asp Ala Cys Gly Met Leu Met Ser Thr 
100 105 110 

He Leu Lys Glu Leu Gin Lys Lys Thr Gin Ala Leu Pro Glu Gin Val 



Ser Leu Ser Glu Asp Leu Leu Gin Pro Pro Val Pro Arg Asp Ser Gly 
130 135 140 

Leu Arg Ala Gly Leu He He Thr Gly Lys Thr Leu Glu Phe Ala Leu 
145 150 155 160 

Gin Glu Ser Leu Gin Lys Gin Phe Leu Glu Leu Thr Ser Trp Cys Gin 
165 170 175 

Ala Val Val Cys Cys Arg Ala Thr Pro Leu Gin Lys Ser Glu Val Val 
180 185 190 

Lys Leu Val Arg Ser His Leu Gin Val Met Thr Leu Ala He Gly Asp 
195 200 205 

Gly Ala Asn Asp Val Ser Met He Gin Val Ala Asp He Gly He Gly 
210 215 220 

Val Ser Gly Gin Glu Gly Met Gin Ala Val Met Ala Ser Asp Phe Ala 
225 230 235 240 

Val Ser Gin Phe Lys His Leu Ser Lys Leu Leu Leu Val His Gly His 
245 250 255 

Trp Cys Tyr Thr Arg Leu Ser Asn Met He Leu Tyr Phe Phe Tyr Lys 
260 265 270 

Asn Val Ala Tyr Val Asn Leu Leu Phe Trp Tyr Gin Phe Phe Cys Gly 
275 280 285 

Phe Ser Gly Thr Ser Met Thr Asp Tyr Trp Val Leu He Phe Phe Asn 
290 295 300 

Leu Leu Phe Thr Ser Ala Pro Pro Val He Tyr Gly Val Leu Glu Lys 
305 310 315 320 

Asp Val Ser Ala Glu Thr Leu Met Gin Leu Pro Glu Leu Tyr Arg Ser 
325 330 335 

Gly Gin Lys Ser Glu Ala Tyr Leu Pro His Thr Phe Trp He Thr Leu 



115 



120 



125 



340 



345 



350 



Leu Asp Ala Phe Tyr Gin Ser Leu Val Cye Phe Phe Val Pro Tyr Phe 
355 360 365 
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Thr Tyr Gin Gly Ser Asp Thr Asp lie Phe Ala Phe Gly Asn Pro Leu 
370 375 380 



Asn Thr Ala Thr Leu Phe lie Val Leu Leu His Leu Val lie Glu Ser 
385 390 395 400 



Lys Ser Leu Thr Arg Cys Ser Asp Ser His Leu Gin Phe Gin Ser Phe 
405 410 415 



Gly Arg Leu Trp lie Thr 
420 



<210> SEQ ID NO 13 

<211> LENGTH: 4281 

<212> TYPE: DMA 

<213> ORGANISM: homo sapiens 

<4 00> SEQUENCE: 13 

atgactgagg ctctccaatg ggccagatat cactggcgac ggctgatcag aggtgcaacc 60 

agggatgatg attcagggcc atacaactat tcctcgttgc tcgcctgtgg gcgcaagtcc 120 

tctcagatcc ctaaactgtc aggaaggcac cggattgttg ttccccacat ccagcccttc 180 

aaggatgagt atgagaagtt ctccggagcc tatgtgaaca atcgaatacg aacaacaaag 240 

tacacacttc tgaattttgt gccaagaaat ttatttgaac aatttcacag agctgccaat 300 

ttatatttcc tgttcctagt tgtcctgaac tgggtacctt tggtagaagc cttccaaaag 360 

gaaatcacca tgttgcctct ggtggtggtc cttacaatta tcgcaattaa agatggcctg 420 

gaagattatc ggaaatacaa aattgacaaa cagatcaata atttaataac taaagtttat 480 

agtaggaaag agaaaaaata cattgaccga tgctggaaag acgttactgt tggggacttt 540 

attcgcctct cctgcaacga ggtcatccct gcagacatgg tactactctt ttccactgat 600 

ccagatggaa tctgtcacat tgagacttct ggtcttgatg gagagagcaa tttaaaacag 660 

aggcaggtgg ttcggggata tgcagaacag gactctgaag ttgatcctga gaagttttcc 720 

agtaggatag aatgtgaaag cccaaacaat gacctcagca gattccgagg cttcctagaa 780 

cattccaaca aagaacgcgt gggtctcagt aaagaaaatt tgttgcttag aggatgcacc 840 

attagaaaca cagaggctgt tgtgggcatt gtggtttatg caggccatga aaccaaagca 900 

atgctgaaca acagtgggcc acggtataag cgcagcaaat tagaaagaag agcaaacaca 960 

gatgtcctct ggtgtgtcat gcttctggtc ataatgtgct taactggcgc agtaggtcat 1020 

ggaatctggc tgagcaggta tgaaaagatg cattttttca atgttcccga gcctgatgga 1080 

catatcatat caccactgtt ggcaggattt tatatgtttt ggaccatgat cattttgtta 1140 

caggtcttga ttcctatttc tctctatgtt tccatcgaaa ttgtgaagct tggacaaata 1200 

tatttcattc aaagtgatgt ggatttctac aatgaaaaaa tggattctat tgttcagtgc 1260 

cgagccctga acatcgccga ggatctggga cagattcagt acctcttttc cgataagaca 1320 

ggaaccctca ctgagaataa gatggttttt cgaagatgta gtgtggcagg atttgattac 1380 

tgccatgaag aaaatgccag gaggttggag tcctatcagg aagctgtctc tgaagatgaa 1440 

gattttatag acacagtcag tggttccctc agcaatatgg caaaaccgag agcccccagc 1500 

tgcaggacag ttcataatgg gcctttggga aataagccct caaatcatct tgctgggagc 1560 

tcttttactc taggaagtgg agaaggagcc agtgaagtgc ctcattccag acaggctgct 1620 

ttcagtagcc ccattgaaac agacgtggta ccagacacca ggcttttaga caaatttagt 1680 

cagattacac ctcggctctt tatgccacta gatgagacca tccaaaatcc accaatggaa 1740 
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48 



-continued 



actttgtaca 


ttatcgactt 


tttcattgca 


ttggcaattt gcaacacagt agtggtttct 


1800 


gctcctaacc 


aaccccgaca 


aaagatcaga 


cacccttcac tgggggggtt gcccattaag 


I860 


tctttggaag 


agattaaaag 


tcttttccag 


agatggtctg tccgaagatc aagttctcca 


1920 


tcgcttaaca 


gtgggaaaga 


gccatcttct 


ggagttccaa acgcctttgt gagcagactc 


1980 


cctctcttta 


gtcgaatgaa 


accagcttca 


cctgtggagg aagaggtctc ccaggtgtgt 


2040 


gagagccccc 


agtgctccag 


tagctcagct 


tgctgcacag aaacagagaa acaacacggt 


2100 


gatgcaggcc 


tcctgaatgg 


caaggcagag 


tccctccctg gacagccatt ggcctgcaac 


2160 


ctgtgttatg 


aggccgagag 


cccagacgaa 


gcggccttag tgtatgccgc cagggcttac 


2220 


caatgcactt 


tacggtctcg 


gacaccagag caggtcatgg tggactttgc tgctttggga 


2280 


ccattaacat 


ttcaactcct 


acacatcctg 


ccctttgact cagtaagaaa aagaatgtct 


2340 


gttgtggtcc 


gacaccctct 


ttccaatcaa 


gttgtggtgt atacgaaagg cgctgattct 


2400 


gtgatcatgg 


agttactgtc ggtggcttcc ccagatggag caagtctgga gaaacaacag 


2460 


atgatagtaa 


gggagaaaac 


ccagaagcac 


ttggatgact atgccaaaca aggccttcgt 


2520 


actttatgta 


tagcaaagaa 


ggtcatgagt gacactgaat atgcagagtg gctgaggaat 


2580 


cattttttag ctgaaaccag cattgacaac agggaagaat tactacttga atctgccatg 


2640 


aggttggaga 


acaaacttac 


attacttggt gctactggca ttgaagaccg tctgcaggag 


2700 


ggagtccctg 


aatctataga 


agctcttcac 


aaagcgggca tcaagatctg gatgctgaca 


2760 


ggggacaagc 


aggagacagc 


tgtcaacata 


gcttatgcat gcaaactact ggagccagat 


2820 


gacaagcttt 


ttatcctcaa 


tacccaaagt 


aaagatgcct gtgggatgct gatgagcaca 


2880 


attttgaaag 


aacttcagaa 


gaaaactcaa 


gccctgccag agcaagtgtc attaagtgaa 


2940 


gatttacttc agcctcctgt 


cccccgggac 


tcagggttac gagctggact cattatcact 


3000 


gggaagaccc 


tggagtttgc 


cctgcaagaa 


agtctgcaaa agcagttcct ggaactgaca 


3060 


tcttggtgtc 


aagctgtggt 


ctgctgccga 


gccacaccgc tgcagaaaag tgaagtggtg 


3120 


aaattggtcc 


gcagccatct 


ccaggtgatg 


acccttgcta ttggtgatgg tgccaatgat 


3180 


gttagcatga tacaagtggc agacattggg 


ataggggtct caggtcaaga aggcatgcag 


3240 


gctgtgatgg ccagtgactt tgccgtttct 


cagttcaaac atctcagcaa gctccttctt 


3300 


gtccatggac 


actggtgtta 


tacacggctt 


tccaacatga ttctctattt tttctataag 


3360 


aatgtggcct atgtgaacct ccttttctgg taccagttct tttgtggatt ttcaggaaca 


3420 


tccatgactg attactgggt tttgatcttc 


ttcaacctcc tcttcacatc tgcccctcct 


3480 


gtcatttatg 


gtgttttgga 


gaaagatgtg 


tctgcagaga ccctcatgca actgcctgaa 


3540 


ctttacagaa 


gtggtcagaa 


atcagaggca 


tacttacccc ataccttctg gatcacctta 


3600 


ttggatgctt 


tttatcaaag 


cctggtctgc 


ttctttgtgc ct-tattttac ctaccagggc 


3660 


tcagatactg 


acatctttgc 


atttggaaac 


cccctgaaca cagccactct gttcatcgtt 


3720 


ctcctccatc 


tggtcattga 


aagcaagagt 


ttgacttgga ttcacttgct ggtcatcatt 


3780 


ggtagcatct tgtcttattt tttatttgcc 


atagtttttg gagccatgtg tgtaacttgc 


3840 


aacccaccat 


ccaaccctta 


ctggattatg 


caggagcaca tgctggatcc agtattctac 


3900 


ttagtttgta 


tcctcacgac 


gtccattgct 


cttctgccca ggtttgtata cagagttctt 


3960 


cagggatccc 


tgtttccatc 


tccaattctg 


agagctaagc actttgacag actaactcca 


4020 


gaggagagga 


ctaaagctct 


caagaagtgg 


agaggggctg gaaagatgaa tcaagtgaca 


4080 


tcaaagtatg 


ctaaccaatc 


agctggcaag tcaggaagaa gacccatgcc tggcccttct 


4140 
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50 



-continued 



gctgtatttg caatgaagtc agcaacttcc tgtgctattg agcaaggaaa cttatctctg 4200 
tgtgaaactg ctttagatca aggctactct gaaactaagg cctttgagat ggctggaccc 4260 



<210> SEQ ID NO 14 

<211> LENGTH: 1426 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 14- 

Met Thr Glu Ala Leu Gin Trp Ala Arg Tyr His Trp Arg Arg Leu lie 
15 10 15 

Arg Gly Ala Thr Arg Asp Asp Asp Ser Gly Pro Tyr Asn Tyr Ser Ser 
20 25 30 

Leu Leu Ala Cys Gly Arg Lys Ser Ser Gin He Pro Lys Leu Ser Gly 
35 40 45 

Arg His Arg He Val Val Pro His He Gin Pro Phe Lys Asp Glu Tyr 
50 55 60 

Glu Lys Phe Ser Gly Ala Tyr Val Asn Asn Arg He Arg Thr Thr Lys 
65 70 75 80 

Tyr Thr Leu Leu Asn Phe Val Pro Arg Asn Leu Phe Glu Gin Phe His 
85 90 95 

Arg Ala Ala Asn Leu Tyr Phe Leu Phe Leu Val Val Leu Asn Trp Val 
100 105 110 

Pro Leu Val Glu Ala Phe Gin Lys Glu He Thr Met Leu Pro Leu Val 
115 120 125 

Val Val Leu Thr He He Ala He Lys Asp Gly Leu Glu Asp Tyr Arg 
130 135 140 

Lys Tyr Lys He Asp Lys Gin He Asn Asn Leu He Thr Lys Val Tyr 
145 150 155 160 

Ser Arg Lys Glu Lys Lys Tyr He Asp Arg Cys Trp Lys Asp Val Thr 
165 170 175 

Val Gly Asp Phe He Arg Leu Ser Cys Asn Glu Val He Pro Ala Asp 
180 185 190 

Met Val Leu Leu Phe Ser Thr Asp Pro Asp Gly He Cys His He Glu 
195 200 205 

Thr Ser Gly Leu Asp Gly Glu Ser Asn Leu Lys Gin Arg Gin Val Val 
210 215 220 

Arg Gly Tyr Ala Glu Gin Asp Ser Glu Val Asp Pro Glu Lys Phe Ser 
225 230 235 240 

Ser Arg He Glu Cys Glu Ser Pro Asn Asn Asp Leu Ser Arg Phe Arg 
245 250 255 

Gly Phe Leu Glu Hie Ser Asn Lys Glu Arg Val Gly Leu Ser Lys Glu 
260 265 270 

Asn Leu Leu Leu Arg Gly Cys Thr He Arg Asn Thr Glu Ala Val Val 
275 280 285 

Gly He Val Val Tyr Ala Gly His Glu Thr Lys Ala Met Leu Asn Asn 
290 295 300 

Ser Gly Pro Arg Tyr Lys Arg Ser Lys Leu Glu Arg Arg Ala Asn Thr 
305 310 315 320 

Asp Val Leu Trp Cys Val Met Leu Leu Val He Met Cys Leu Thr Gly 
325 330 335 

Ala Val Gly His Gly He Trp Leu Ser Arg Tyr Glu Lys Met His Phe 



tccaaaggta aagaaagcta g 



4281 
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-continued 



340 



345 



350 



Phe Asn Val Pro Glu Pro Asp Gly His He lie Ser Pro Leu Leu Ala 
355 360 365 

Gly Phe Tyr Met Phe Trp Thr Met He He Leu Leu Gin Val Leu He 
370 375 380 

Pro He Ser Leu Tyr Val Ser He Glu He Val Lys Leu Gly Gin He 
385 390 395 400 

Tyr Phe He Gin Ser Asp Val Asp Phe Tyr Asn Glu Lys Met Asp Ser 
405 410 415 

He Val Gin Cys Arg Ala Leu Asn He Ala Glu Asp Leu Gly Gin He 
420 425 430 

Gin Tyr Leu Phe Ser Asp Lys Thr Gly Thr Leu Thr Glu Asn Lys Met 
435 440 445 

Val Phe Arg Arg Cys Ser Val Ala Gly Phe Asp Tyr Cys His Glu Glu 
450 455 460 

Asn Ala Arg Arg Leu Glu Ser Tyr Gin Glu Ala Val Ser Glu Asp Glu 
465 470 475 480 

Asp Phe He Asp Thr Val Ser Gly Ser Leu Ser Asn Met Ala Lys Pro 
485 490 495 

Arg Ala Pro Ser Cys Arg Thr Val His Asn Gly Pro Leu Gly Asn Lys 
500 505 510 

Pro Ser Asn His Leu Ala Gly Ser Ser Phe Thr Leu Gly Ser Gly Glu 
515 520 525 

Gly Ala Ser Glu Val Pro His Ser Arg Gin Ala Ala Phe Ser Ser Pro 
530 535 540 

He Glu Thr Asp Val Val Pro Asp Thr Arg Leu Leu Asp Lys Phe Ser 
545 550 555 560 

Gin He Thr Pro Arg Leu Phe Met Pro Leu Asp Glu Thr He Gin Asn 
565 570 575 

Pro Pro Met Glu Thr Leu Tyr He He Asp Phe Phe He Ala Leu Ala 
580 585 590 

He Cys Asn Thr Val Val Val Ser Ala Pro Asn Gin Pro Arg Gin Lys 
595 600 605 

He Arg His Pro Ser Leu Gly Gly Leu Pro He Lys Ser Leu Glu Glu 
610 615 620 

He Lys Ser Leu Phe Gin Arg Trp Ser Val Arg Arg Ser Ser Ser Pro 
625 630 635 640 

Ser Leu Asn Ser Gly Lys Glu Pro Ser Ser Gly Val Pro Asn Ala Phe 
645 650 655 

Val Ser Arg Leu Pro Leu Phe Ser Arg Met Lys Pro Ala Ser Pro Val 
660 665 670 

Glu Glu Glu Val Ser Gin Val Cys Glu Ser Pro Gin Cys Ser Ser Ser 
675 680 685 

Ser Ala Cys Cys Thr Glu Thr Glu Lys Gin His Gly Asp Ala Gly Leu 
690 695 700 

Leu Asn Gly Lys Ala Glu Ser Leu Pro Gly Gin Pro Leu Ala Cys Asn 
705 710 715 720 

Leu Cys Tyr Glu Ala Glu Ser Pro Asp Glu Ala Ala Leu Val Tyr Ala 
725 730 735 

Ala Arg Ala Tyr Gin CyB Thr Leu Arg Ser Arg Thr Pro Glu Gin Val 



740 



745 



750 



Met Val Asp Phe Ala Ala Leu Gly Pro Leu Thr Phe Gin Leu Leu His 
755 760 765 
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-continued 



He Leu Pro Phe Asp Ser Val Arg Lys Arg Met Ser Val Val Val Arg 
770 775 780 

Hie Pro Leu Ser Asn Gin Val Val Val Tyr Thr Lys Gly Ala Asp Ser 
785 790 795 800 

Val He Met Glu Leu Leu Ser Val Ala Ser Pro Asp Gly Ala Ser Leu 
805 810 815 

Glu Lys Gin Gin Met He Val Arg Glu Lys Thr Gin Lys His Leu Asp 
820 825 830 

Asp Tyr Ala Lys Gin Gly Leu Arg Thr Leu Cys He Ala Lys Lys Val 
835 840 845 

Met Ser Asp Thr Glu Tyr Ala Glu Trp Leu Arg Asn His Phe Leu Ala 
850 855 860 

Glu Thr Ser He Asp Asn Arg Glu Glu Leu Leu Leu Glu Ser Ala Met 
865 870 875 880 

Arg Leu Glu Asn Lys Leu Thr Leu Leu Gly Ala Thr Gly He Glu Asp 
885 890 895 

Arg Leu Gin Glu Gly Val Pro Glu Ser He Glu Ala Leu His Lys Ala 
900 905 910 

Gly He Lys lie Trp Met Leu Thr Gly Asp Lys Gin Glu Thr Ala Val 
915 920 925 

Asn He Ala Tyr Ala Cys Lys Leu Leu Glu Pro Asp Asp Lys Leu Phe 
930 935 940 

He Leu Asn Thr Gin Ser Lys Asp Ala Cye Gly Met Leu Met Ser Thr 
945 950 955 960 

He Leu Lys Glu Leu Gin Lys Lys Thr Gin Ala Leu Pro Glu Gin Val 
965 970 975 

Ser Leu Ser Glu Asp Leu Leu Gin Pro Pro Val Pro Arg Asp Ser Gly 



Gin Glu Ser Leu Gin Lys Gin Phe Leu Glu Leu Thr Ser Trp Cys Gin 
1010 1015 1020 

Ala Val Val Cys Cys Arg Ala Thr Pro Leu Gin Lys Ser Glu Val Val 
1025 1030 1035 104( 

Lys Leu Val Arg Ser His Leu Gin Val Met Thr Leu Ala He Gly Asp 
1045 1050 1055 

Gly Ala Asn Asp Val Ser Met He Gin Val Ala Asp He Gly lie Gly 
1060 1065 1070 

Val Ser Gly Gin Glu Gly Met Gin Ala Val Met Ala Ser Asp Phe Ala 
1075 1080 1085 

Val Ser Gin Phe Lys His Leu Ser Lys Leu Leu Leu Val His Gly His 
1090 1095 1100 

Trp Cys Tyr Thr Arg Leu Ser Asn Met He Leu Tyr Phe Phe Tyr Lys 
1105 1110 1115 112( 

Asn Val Ala Tyr Val Asn Leu Leu Phe Trp Tyr Gin Phe Phe Cys Gly 
1125 1130 1135 

Phe Ser Gly Thr Ser Met Thr Asp Tyr Trp Val Leu He Phe Phe Asn 
1140 1145 1150 

Leu Leu Phe Thr Ser Ala Pro Pro Val He Tyr Gly Val Leu Glu Lys 
1155 1160 1165 

Asp Val Ser Ala Glu Thr Leu Met Gin Leu Pro Glu Leu Tyr Arg Ser 
1170 1175 1180 



980 



985 



990 



Leu Arg Ala Gly Leu He He Thr Gly Lys Thr Leu Glu Phe Ala Leu 
995 1000 1005 
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Gly Gin Lys Ser Glu Ala Tyr Leu Pro His Thr Phe Trp lie Thr Leu 
1185 1190 1195 1200 

Leu Asp Ala Phe Tyr Gin Ser Leu Val Cys Phe Phe Val Pro Tyr Phe 
1205 1210 1215 

Thr Tyr Gin Gly Ser Asp Thr Asp lie Phe Ala Phe Gly Asn Pro Leu 
1220 1225 1230 

Asn Thr Ala Thr Leu Phe lie Val Leu Leu His Leu Val lie Glu Ser 
1235 1240 1245 

Lys Ser Leu Thr Trp lie His Leu Leu Val lie lie Gly Ser lie Leu 
1250 1255 1260 

Ser Tyr Phe Leu Phe Ala lie Val Phe Gly Ala Met Cys Val Thr Cys 
1265 1270 1275 1280 

Asn Pro Pro Ser Asn Pro Tyr Trp lie Met Gin Glu His Met Leu Asp 
1285 1290 1295 

Pro Val Phe Tyr Leu Val Cys lie Leu Thr Thr Ser He Ala Leu Leu 
1300 1305 1310 

Pro Arg Phe Val Tyr Arg Val Leu Gin Gly Ser Leu Phe Pro Ser Pro 
1315 1320 1325 

He Leu Arg Ala Lys His Phe Asp Arg Leu Thr Pro Glu Glu Arg Thr 
1330 1335 1340 

Lys Ala Leu Lys Lys Trp Arg Gly Ala Gly Lys Met Asn Gin Val Thr 
1345 1350 1355 1360 

Ser Lys Tyr Ala Asn Gin Ser Ala Gly Lys Ser Gly Arg Arg Pro Met 
1365 1370 1375 

Pro Gly Pro Ser Ala Val Phe Ala Met Lys Ser Ala Thr Ser Cys Ala 
1380 1385 ■ 1390 

He Glu Gin Gly Asn Leu Ser Leu Cys Glu Thr Ala Leu Asp Gin Gly 
1395 1400 1405 

Tyr Ser Glu Thr Lys Ala Phe Glu Met Ala Gly Pro Ser Lys Gly Lys 
1410 1415 1420 

Glu Ser 
1425 



<210> SEQ ID NO 15 

<211> LENGTH: 1737 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 15 

atgagtgaca ctgaatatgc agagtggctg aggaatcatt ttttagctga aaccagcatt 6 0 

gacaacaggg aagaattact acttgaatct gccatgaggt tggagaacaa acttacatta 120 

cttggtgcta ctggcattga agaccgtctg caggagggag tccctgaatc tatagaagct 180 

cttcacaaag cgggcatcaa gatctggatg ctgacagggg acaagcagga gacagctgtc 240 

aacatagctt atgcatgcaa actactggag ccagatgaca agctttttat cctcaatacc 300 

caaagtaaag atgcctgtgg gatgctgatg agcacaattt tgaaagaact tcagaagaaa 360 

actcaagccc tgccagagca agtgtcatta agtgaagatt tacttcagcc tcctgtcccc 420 

cgggactcag ggttacgagc tggactcatt atcactggga agaccctgga gtttgccctg 48 0 

caagaaagtc tgcaaaagca gttcctggaa ctgacatctt ggtgtcaagc tgtggtctgc 540 

tgccgagcca caccgctgca gaaaagtgaa gtggtgaaat tggtccgcag ccatctccag 600 

gtgatgaccc ttgctattgg tgatggtgcc aatgatgtta gcatgataca agtggcagac 660 

attgggatag gggtctcagg tcaagaaggc atgcaggctg tgatggccag tgactttgcc 720 
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gtttctcagt 


tcaaacatct 


cagcaagctc 


cttcttgtcc 


atggacactg 


gtgttataca 


780 


cggctttcca 


acatgattct 


ctattttttc 


tataagaatg 


tggcctatgt 


gaacctcctt 


840 


ttctggtacc 


agttcttttg 


tggattttca 


ggaacatcca 


tgactgatta 


ctgggttttg 


900 


atcttcttca 


acctcctctt 


cacatctgcc 


cctcctgtca 


tttatggtgt 


tttggagaaa 


960 


gatgtgtctg 


cagagaccct 


catgcaactg 


cctgaacttt 


acagaagtgg 


tcagaaatca 


1020 


gaggcatact 


taccccatac 


cttctggatc 


accttattgg 


atgcttttta 


tcaaagcctg 


1080 


gtctgcttct 


ttgtgcctta 


ttttacctac 


cagggctcag 


atactgacat 


ctttgcattt 


1140 


ggaaaccccc 


tgaacacagc 


cactctgttc 


atcgttctcc 


tccatctggt 


cattgaaagc 


1200 


aagagtttga 


cttggattca 


cttgctggtc 


atcattggta 


gcatcttgtc 


ttatttttta 


1260 


tttgccatag 


tttttggagc 


catgtgtgta 


acttgcaacc 


caccatccaa 


cccttactgg 


1320 


attatgcagg 


agcacatgct 


ggatccagta 


ttctacttag 


tttgtatcct 


cacgacgtcc 


1380 


attgctcttc 


tgcccaggtt 


tgtatacaga 


gttcttcagg 


gatccctgtt 


tccatctcca 


1440 


attctgagag 


ctaagcactt 


tgacagacta 


actccagagg 


agaggactaa 


agctctcaag 


1500 


aagtggagag 


gggctggaaa 


gatgaatcaa 


gtgacatcaa 


agtatgctaa 


ccaatcagct 


1560 


ggcaagtcag 


gaagaagacc 


catgcctggc 


ccttctgctg 


tatttgcaat 


gaagtcagca 


1620 


acttcctgtg 


ctattgagca 


aggaaactta 


tctctgtgtg 


aaactgcttt 


agatcaaggc 


1680 


tactctgaaa 


ctaaggcctt 


tgagatggct 


ggaccctcca 


aaggtaaaga 


aagctag 


1737 



<210> SEQ ID NO 16 

<211> LENGTH: 578 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 16 

Met Ser Asp Thr Glu Tyr Ala Glu Trp Leu Arg Asn His Phe Leu Ala 
15 10 15 

Glu Thr Ser He ABp Asn Arg Glu Glu Leu Leu Leu Glu Ser Ala Met 
20 25 30 

Arg Leu Glu Asn Lys Leu Thr Leu Leu Gly Ala Thr Gly He Glu Asp 
35 40 45 

Arg Leu Gin Glu Gly Val Pro Glu Ser He Glu Ala Leu His Lys Ala 
50 55 60 

Gly He Lys He Trp Met Leu Thr Gly Asp Lys Gin Glu Thr Ala Val 
65 70 75 80 

Asn He Ala Tyr Ala Cys Lys Leu Leu Glu Pro Asp Asp Lys Leu Phe 
85 90 95 

lie Leu Asn Thr Gin Ser Lys Asp Ala Cys Gly Met Leu Met Ser Thr 
100 105 110 

He Leu Lys Glu Leu Gin Lys Lys Thr Gin Ala Leu Pro Glu Gin Val 
115 120 125 

Ser Leu Ser Glu Asp Leu Leu Gin Pro Pro Val Pro Arg Asp Ser Gly 
130 135 140 

Leu Arg Ala Gly Leu He He Thr Gly Lys Thr Leu Glu Phe Ala Leu 
145 150 155 160 

Gin Glu Ser Leu Gin Lys Gin Phe Leu Glu Leu Thr Ser Trp Cys Gin 
165 170 175 

Ala Val Val Cys Cys Arg Ala Thr Pro Leu Gin Lys Ser Glu Val Val 
180 185 190 
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Lys Leu Val Arg Ser His Leu Gin Val Met Thr Leu Ala lie Gly Asp 
195 200 205 

Gly Ala Asn Asp Val Ser Met He Gin Val Ala Asp He Gly He Gly 
210 215 220 

Val Ser Gly Gin Glu Gly Met Gin Ala Val Met Ala Ser Asp Phe Ala 
225 230 235 240 

Val Ser Gin Phe Lys His Leu Ser Lys Leu Leu Leu Val His Gly His 
245 250 255 

Trp Cys Tyr Thr Arg Leu Ser Asn Met He Leu Tyr Phe Phe Tyr Lys 
260 265 270 

Asn Val Ala Tyr Val Asn Leu Leu Phe Trp Tyr Gin Phe Phe Cys Gly 
275 280 285 

Phe Ser Gly Thr Ser Met Thr Asp Tyr Trp Val Leu He Phe Phe Asn 
290 295 300 

Leu Leu Phe Thr Ser Ala Pro Pro Val He Tyr Gly Val Leu Glu Lys 
305 310 315 320 

Asp Val Ser Ala Glu Thr Leu Met Gin Leu Pro Glu Leu Tyr Arg Ser 
325 330 335 

Gly Gin Lys Ser Glu Ala Tyr Leu Pro His Thr Phe Trp He Thr Leu 
340 345 350 

Leu Asp Ala Phe Tyr Gin Ser Leu Val Cys Phe Phe Val Pro Tyr Phe 
355 360 365 

Thr Tyr Gin Gly Ser Asp Thr Asp He Phe Ala Phe Gly Asn Pro Leu 
370 375 380 

Asn Thr Ala Thr Leu Phe He Val Leu Leu His Leu Val He Glu Ser 
385 390 395 400 

Lys Ser Leu Thr Trp He His Leu Leu Val He He Gly Ser He Leu 
405 410 415 

Ser Tyr Phe Leu Phe Ala He Val Phe Gly Ala Met Cys Val Thr Cys 
420 425 430 

Asn Pro Pro Ser Asn Pro Tyr Trp He Met Gin Glu His Met Leu Asp 
435 440 445 

Pro Val Phe Tyr Leu Val Cys He Leu Thr Thr Ser lie Ala Leu Leu 
450 455 460 

Pro Arg Phe Val Tyr Arg Val Leu Gin Gly Ser Leu Phe Pro Ser Pro 
465 470 475 480 

He Leu Arg Ala Lys His Phe Asp Arg Leu Thr Pro Glu Glu Arg Thr 
485 490 495 

Lys Ala Leu Lys Lys Trp Arg Gly Ala Gly Lys Met Asn Gin Val Thr 
500 505 510 

Ser Lys Tyr Ala Asn Gin Ser Ala Gly Lys Ser Gly Arg Arg Pro Met 
515 520 525 

Pro Gly Pro Ser Ala Val Phe Ala Met Lys Ser Ala Thr Ser Cys Ala 
530 535 540 

He Glu Gin Gly Asn Leu Ser Leu Cys Glu Thr Ala Leu Asp Gin Gly 
545 550 555 560 

Tyr Ser Glu Thr Lys Ala Phe Glu Met Ala Gly Pro Ser Lys Gly Lys 
565 570 575 

Glu Ser 



<210> SEQ ID NO 17 

<211> LENGTH: 5958 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 
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<400> SEQUENCE: 17 



gtcagctaca 


caacctggat 


cttaccacag 


tttggatatg 


actgaggctc 


tccaatgggc 


60 


cagatatcac 


tggcgacggc 


tgatcagagg 


tgcaaccagg 


gatgatgatt 


cagggccata 


120 


caactattcc 


tcgttgctcg 


cctgtgggcg 


caagtcctct 


cagatcccta 


aactgtcagg 


180 


aaggcaccgg 


attgttgttc 


cccacatcca 


gcccttcaag 


gatgagtatg 


agaagttctc 


240 


cggagcctat 


gtgaacaatc 


gaatacgaac 


aacaaagtac 


acacttctga 


attttgtgcc 


300 


aagaaattta 


tttgaacaat 


ttcacagagc 


tgccaattta 


tatttcctgt 


tcctagttgt 


360 


cctgaactgg 


gtacctttgg 


tagaagcctt 


ccaaaaggaa 


atcaccatgt 


tgcctctggt 


420 


ggtggtcctt 


acaattatcg 


caattaaaga 


tggcctggaa 


gattatcgga 


aatacaaaat 


480 


tgacaaacag 


atcaataatt 


taataactaa 


agtttatagt 


aggaaagaga 


aaaaatacat 


540 


tgaccgatgc 


tggaaagacg 


ttactgttgg 


ggactttatt 


cgcctctcct 


gcaacgaggt 


600 


catccctgca 


gacatggtac 


tactcttttc 


cactgatcca 


gatggaatct 


gtcacattga 


660 


gacttctggt 


cttgatggag 


agagcaattt 


aaaacagagg 


caggtggttc 


ggggatatgc 


720 


agaacaggac 


tctgaagttg 


atcctgagaa 


gttttccagt 


aggatagaat 


gtgaaagccc 


780 


aaacaatgac 


ctcagcagat 


tccgaggctt 


cctagaacat 


tccaacaaag 


aacgcgtggg 


840 


tctcagtaaa 


gaaaatttgt 


tgcttagagg 


atgcaccatt 


agaaacacag 


aggctgttgt 


900 


gggcattgtg 


gtttatgcag 


gccatgaaac 


caaagcaatg 


ctgaacaaca 


gtgggccacg 


960 


gtataagcgc 


agcaaattag 


aaagaagagc 


aaacacagat 


gtcctctggt 


gtgtcatgct 


1020 


tctggtcata 


atgtgcttaa 


ctggcgcagt 


aggtcatgga 


atctggctga 


gcaggtatga 


1080 


aaagatgcat 


tttttcaatg 


ttcccgagcc 


tgatggacat 


atcatatcac 


cactgttggc 


1140 


aggattttat 


atgttttgga 


ccatgatcat 


tttgttacag 


gtcttgattc 


ctatttctct 


1200 


ctatgtttcc 


atcgaaattg 


tgaagcttgg 


acaaatatat 


ttcattcaaa 


gtgatgtgga 


1260 


tttctacaat 


gaaaaaatgg 


attctattgt 


tcagtgccga 


gccctgaaca 


tcgccgagga 


1320 


tctgggacag 


attcagtacc 


tcttttccga 


taagacagga 


accctcactg 


agaataagat 


1380 


ggtttttcga 


agatgtagtg 


tggcaggatt 


tgattactgc 


catgaagaaa 


atgccaggag 


1440 


gttggagtcc 


tatcaggaag 


ctgtctctga 


agatgaagat 


tttatagaca 


cagtcagtgg 


1500 


ttccctcagc 


aatatggcaa 


aaccgagagc 


ccccagctgc 


aggacagttc 


ataatgggcc 


1560 


tttgggaaat 


aagccctcaa 


atcatcttgc 


tgggagctct 


tttactctag 


gaagtggaga 


1620 


aggagccagt 


gaagtgcctc 


attccagaca 


ggctgctttc 


agtagcccca 


ttgaaacaga 


1680 


cgtggtacca 


gacaccaggc 


ftttagacaa 


atttagtcag 


attacacctc 


ggctctttat 


1740 


gccactagat 


gagaccatcc 


aaaatccacc 


aatggaaact 


ttgtacatta 


tcgacttttt 


1800 


cattgcattg 


gcaatttgca 


acacagtagt 


ggtttctgct 


cctaaccaac 


cccgacaaaa 


1860 


gatcagacac 


ccttcactgg 


gggggttgcc 


cattaagtct 


ttggaagaga 


ttaaaagtct 


1920 


tttccagaga 


tggtctgtcc 


gaagatcaag 


ttctccatcg 


cttaacagtg 


ggaaagagcc 


1980 


atcttctgga 


gttccaaacg 


cctttgtgag 


cagactccct 


ctctttagtc 


gaatgaaacc 


2040 


agcttcacct 


gtggaggaag 


aggtctccca 


ggtgtgtgag 


agcccccagt 


gctccagtag 


2100 


ctcagcttgc 


tgcacagaaa 


cagagaaaca 


acacggtgat 


gcaggcctcc 


tgaatggcaa 


2160 


ggcagagtcc 


ctccctggac 


agccattggc 


ctgcaacctg 


tgttatgagg 


ccgagagccc 


2220 


agacgaagcg 


gccttagtgt 


atgccgccag 


ggcttaccaa 


tgcactttac 


ggtctcggac 


2280 
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accagagcag 




actttgctgc 


tttgggacca 


ttaacatttc 


aactcct aca 


2340 


catcctgccc 


tttgactcag 


taagaaaaag 


aatgtctgtt 


otaatccaac 

3 "-3 3 ^ ^ ^ 3 " w 


accct ctttc 


2400 


caatcaagtt 


atacTtatata 

3 33 3 


cgaaaggcgc 


tgattctgtg 


atcatggagt 


tactgtcggt 


2460 


ggcttcccca 


aatcjQaqcaa 


gtctggagaa 


acaacagatg 


ataataaaaa 

« Lau yy ^ 


agaaaaccca 


2520 


gaagcacttt 


tttcttccat 


ttcaggtgtc 


gtgaaaagct 


tgaattcggc 


gegecagata 


2580 


tcacgcgtgc 


caaaacractcr 


gctcaggatg actatgccaa 


acaaggcctt 


cgtactttat 


2640 


gtatagcaaa 


aaaaatcata 


agtgacactg 


aatatgcaga 


3 ^3 3^ °3 a 3 3 


aatcattttt 


2700 


tagctgaaac 


cagcattgac 


aacagggaag 


aattactact 


t gaat ctgcc 


atgaggttgg 


2760 


agaacaaact 


tacattactt 


ggtgctactg 


gcattgaaga 


ccgtctgcag 


g aggg agtc c 


2820 


ctgaatctat 


agaagctc tt 


cacaaagcgg 


gcatcaagat 


ctggatgct g 


acaggggaca 


2 880 


agcaggagac 


agctgt caac 


atagc ttatg 


catgcaaact 


actggagcca 




2940 


tttttatcct 


caataccc aa 


agtaaagtgc 


gtatattgag 


a^^a&a^c^ g 


4-4-r>4-4- r»4- «4- A 


3000 


+■ r"flfl « net 


l~> Q L. L>*^ ^ uu^a 


4-4-4- n a na 4-4-4- 

xxx. ga gattt 


cf atgt at gca 


aggattaaaa 


aaatgee tgt 


JUOU 


gggatgctga 


tgagcacaat 


tttgaaagaa 


cttcagaaga 


A A A f~ m A f\ f 


cctgccagag 


3120 


caagtgtcat 


t aagtgaaga 


4-4-4-ai~4-4-nnn 

tttacttcag 


cctcctgtcc 




m /rrrrff - a n a 

337 3 


3180 


gctggactca 


ttatcact gg 


gaagaccctg 


gagtttgccc 


4- npaa n A A A n 




3240 


cagttcctgg 


aactgacatc 


ttggtgtcaa 


gctgtggtct 


get gc cgag c 


cacaccgct g 


3300 


cagaaaagtg 


aaataataaa 

3 3 3 


att ggtccgc 


agccatctcc 


aggtgatgac 


ecttgetatt 


3360 


aatcraataaa 

3 3 w 3 3 *-3**3 


gatgaatctg 


agtcctgctc 


ttctcccttt 


cacac cacac 


cagacaccga 


3420 


tccttctgtc 


tctttcttct 


cccactgttc 


cttccatttt 


cctcc tccct 




3480 


cacattcatg 


ccttcccatc 


acctatttga 


gcaccttcct 


ccatcacct a 


tttgagcacc 


3540 


ttctgtgaac 


caggtaatag 


ggatgtgaca 


tggtaaacaa 


tacagtagtc 


cagacttctt 


3600 


agttcagtgt 


cagaccccca 


aatcaacaag 


cttaaatcaa 


gtaat aaact 


gaatcacaga 


3660 


actgaaaaat 


ccatgtgttc 


taccttcagg 


aaagctaaat 






3720 


ttctttatcc 


attccacaag 


tatttatcaa 


gtgccttttt 


+ n+" ft c c a tt et 


a t u t u tu t a y 


3780 


atggagatac 


aagagtatat 


aaaattggca 


aactaccttt 




cttacat eta 


3840 


cttactaaaaac 


atgcagttaa 


acaaagcata 


atctgtcagg 


ttcaggtagt 


gat aa gt act 


3900 


attggaaaaa 


taagtggatg 


aggacacgta 


tagcactgga 


n a ^ ft rt ete* + ft ft 




3960 


taaatcgatt 


t caagagcta 


ctgtaagttg 


actgggagca 


gag at gt ga a 


ggaaatcata 


4020 




gagacatggt 


ggtgccaatg 


atgttagcat 


gat ac aagt g 


gcagacattg 






c tcaggtcaa 


gaaggcatgc 


aggctgtgat 


n n f r< a rY+~ n a r* 


+■ *r +■ r* r« rt + 4- +• 


4140 


ctcagttcaa 


acatctcagc 


aagctccttc ttgtccatgg 


Af*A/*+rrtft4-rt4- 


t atacaegge 


4200 


tttccaacat 


gattctctat 


tttttctata 


agaatgtggc 


ctatgtgaac 


ctccttttct 


4260 


ggtaccagtt 


cttttgtgga 


ttttcaggaa 


catccatgac 


tgattactgg 


gttttgatct 


4320 


tcttceacct 


cctcttcaca 


tctgcccctc 


ctgtcattta 


tggtgttttg 


gagaaagatg 


4380 


tgtctgcaga 


gaccctcatg 


caactgcctg 


aactttacag 


aagtggtcag 


aaatcagagg 


4440 


catacttacc 


ccataccttc 


tggatcacct 


tattggatgc 


tttttatcaa 


agectggtet 


4500 


gcttctttgt 


gccttatttt 


acctaccagg 


gctcagatac 


tgacatcttt 


gcatttggaa 


4560 


accccctgaa 


cacagccact 


ctgttcatcg ttctcctcca 


tctggtcatt 


gaaagcaaga 


4620 


gtttgaccag 


gtgcagtgac 


tcacacctgc 


aattccagag 


ctttgggagg 


ctgtggatca 


4680 
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catgaagcta 


agagttcaag 


accagcctgg 


gcaacataac 


ttggattcac 


ttgctggtca 


4740 


tcat*tggtag 


catcttgtct 


tattttttat 


ttgccatagt 


ttttggagcc 


atgtgtgtaa 


4800 


cttgcaaccc 


accatccaac 


ccttactgga 


ttatgcagga 


gcacatgctg 


gatccagtat 


4860 


tctacttagt 


ttgtatcctc 


acgacgtcca 


ttgctcttct 


gcccaggttt 


gtatacagag 


4920 


ttct-tcaggg 


atccctgttt 


ccatctccaa 


ttctgagagc 


taagcacttt 


gacagactaa 


4980 


ctccagagga 


gaggactaaa 


gctctcaaga 


agtggagagg 


ggctggaaag 


atgaatcaag 


5040 


tgacatcaaa 


gtatgctaac 


caatcagctg 


gcaagtcagg 


aagaagaccc 


atgcctggcc 


5100 


cttctgctgt 


atttgcaatg 


aagtcagcaa 


cttcctgtgc 


tattgagcaa 


ggaaacttat 


5160 


ctctgtgtga 


aactgcttta 


gatcaaggct 


actctgaaac 


taaggccttt 


QaqatQcrctq 


5220 


gaccctccaa 


aggtaaagaa 


agctagatac 


cctccttgga 


gttgcaagta 


ttctttcaag 


5280 


gttggaagag 


ggattttgaa 


gaggtatctc 


tccaagcaag 


aatgacttgt 


ttttccataa 


5340 


gggacatgag 


cattttacta 


ggcttggaag 


agctgacatg 


atgagcatta 


ttgtatgttt 


5400 


gtatatacat 


ttgtgataga 


gggctagagt 


ttgacctaga 


gagagtttaa 


ggaagtgaaa 


5460 


tatttaattc 


agaaccaaat 


gcttttgtaa 


aactttttgg 


attttgtaaa 


agcattttca 


5520 


ttctcttaga 


aattcaagta 


ttttcaaggg 


gagtcatttg 


agatatattt 


attttactag 


5580 


gagatcttat 


attctaggga 


aatgctttaa 


atggtcaggc 


tccaatcgga 


atttttttaa 


5640 


gaaaaaagta 


gtttttaata 


cattggttag 


gactcagagg 


aaatacggaa 


aaaacattgt 


5700 


agatggtaat 


ttacagataa 


aatcccaaga 


gcctttaaac 


aacaaggtac 


ctaaataggg 


5760 


tataattata 


ctgcttaaaa 


tacaggtagt 


gcctattaat 


agctttttat 


ttcctatggg 


5820 


gagatgcttt 


ggtcttctgg 


ctgagatgta 


ggcatacctc 


tcactcattt 


caatgctttc 


5880 


ctgaggtgga 


gccttcattg 


gaaaggggaa 


agagggttct 


aggttcatca 


gggaccagga 


5940 


atgctttcct 


ctggcagg 
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40 

What is claimed is: highly stringent conditions are: hybridization in 0.5 M 

1. An isolated nucleic acid molecule comprising the novel NaHP0 4 , 7% sodium dodecyl sulfate (SDS), and 1 mM 
human ATPase nucleotide sequence described in SEQ ID EDTA at 65° C. and washing in O.lxSSC/0.1% SDS at 
NO: 13. c 

2. An isolated nucleic acid molecule comprising a novel 45 - A . , t A , . , , . . . 
human ATPase nucleotide sequence that: 3 ' ^ ***** nucleic acid molecule «™P™«* a human 

(a) encodes the amino acid sequence shown in SEQ ID ArPase ™ cleotide sequence encoding the amino acid 
NO:14; and sequence of SEQ ID NO: 14. 

(b) hybridizes under highly stringent conditions to the 

nucleotide sequence of SEQ ID NO: 13, wherein the ***** 
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HUMAN PROTEASES AND 
POLYNUCLEOTIDES ENCODING THE 
SAME 

The present application claims the benefit of U.S. Pro- 
visional Application Number 60/225,852 which was filed on 
Aug. 16, 2000 and is herein incorporated by reference in its 
entirety. 

1. INTRODUCTION 

The present invention relates to the discovery, 
identification, and characterization of novel human poly- 
nucleotides encoding proteins sharing sequence similarity 
with mammalian proteases. The invention encompasses the 
described polynucleotides, host cell expression systems, the 
encoded proteins, fusion proteins, polypeptides and 
peptides, antibodies to the encoded proteins and peptides, 
and genetically engineered animals that either lack or over 
express the disclosed sequences, antagonists and agonists of 
the proteins, and other compounds that modulate the expres- 
sion or activity of the proteins encoded by the disclosed 
polynucleotides that can be used for diagnosis, drug 
screening, clinical trial monitoring, or the treatment of 
physiological disorders or diseases. 

2. BACKGROUND OF THE INVENTION 

Proteases cleave protein substrates as part of degradation, 
maturation, and secretory pathways within the body. Pro- 
teases have been associated with, inter alia, regulating 
development, diabetes, obesity, infertility, modulating cel- 
lular processes, and infectious disease. 

3. SUMMARY OF THE INVENTION 

The present invention relates to the discovery, 
identification, and characterization of nucleotides that 
encode novel human proteins, and the corresponding amino 
acid sequences of these proteins. The novel human proteins 
(NHPs) described for the first time herein share structural 
similarity with animal proteases and particularly zinc met- 
allopro teases. 

The novel human nucleic acid (cDNA) sequences 
described herein, encode proteins/open reading frames 
(ORFs) of 491 and 1224 amino acids in length (see SEQ ID 
NOS: 2 and 4 respectively). 

The invention also encompasses agonists and antagonists 
of the described NHPs, including small molecules, large 
molecules, mutant NHPs, or portions thereof that compete 
with native NHPs, NHP peptides, and antibodies, as well as 
nucleotide sequences that can be used to inhibit the expres- 
sion of the described NHPs (e.g., antisense and ribozyme 
molecules, and gene or regulatory sequence replacement 
constructs) or to enhance the expression of the described 
NHPs (e.g., expression constructs that place the described 
gene under the control of a strong promoter system), and 
transgenic animals that express a NHP transgcne, or "knock- 
outs" (which can be conditional) that do not express a 
functional NHP. 

Further, the present invention also relates to processes for 
identifying compounds that modulate, i.e., act as agonists or 
antagonists, of NHP expression and/or NHP activity that 
utilize purified preparations of the described NHPs and/or 
NHP products, or cells expressing the same. Such com- 
pounds can be used as therapeutic agents for the treatment 
of any of a wide variety of symptoms associated with 
biological disorders or imbalances. 
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4. DESCRIPTION OF THE SEQUENCE LISTING 
AND FIGURES 

The Sequence Listing provides the sequences of several 
NHP ORFs encoding the described NHP amino acid 
sequences. SEQ ID NO:5 describes a NHP ORF and flank- 
ing sequences. 

5. DETAILED DESCRIPTION OF THE 
INVENTION 

10 

The NHPs described for the first time herein are novel 
proteins that are expressed in, inter aha, human cell lines, 
and human fetal brain, brain, pituitary, cerebellum, spinal 
cord, thymus, lymph node, trachea, kidney, fetal fiver, 

15 prostate, testis, thyroid, adrenal gland, pancreas, small 
intestine, colon, skeletal muscle, heart, uterus, mammary 
gland, adipose, esophagus, bladder, cervix, pericardium, 
ovary, fetal kidney, and fetal lung cells. 

The described sequences were compiled from cDNA 

20 clones, genomic sequence, and cDNAs derived from human 
kidney, mammary gland, and cerebellum mRNAs (Edge 
Biosystems, Gaithersburg, Md., and Clontech, Palo Alto, 
Calif.). The present invention encompasses the nucleotides 
presented in the Sequence Listing, host cells expressing such 

25 nucleotides, the expression products of such nucleotides, 
and: (a) nucleotides that encode mammalian homologs of 
the described genes, including the specifically described 
NHPs, and NHP products; (b) nucleotides that encode one or 
more portions of a NHP that correspond to functional 

30 domains, and the polypeptide products specified by such 
nucleotide sequences, including but not limited to the novel 
regions of any active domain(s); (c) isolated nucleotides that 
encode mutant versions, engineered or naturally occurring, 
of the described NHPs in which all or a part of at least one 

35 domain is deleted or altered, and the polypeptide products 
specified by such nucleotide sequences, including but not 
limited to soluble proteins and peptides in which all or a 
portion of the signal sequence is deleted; (d) nucleotides that 
encode chimeric fusion proteins containing all or a portion 

40 of a coding region of a NHP, or one of its domains (e.g., a 
receptor or ligand binding domain, accessory protein/self- 
association domain, etc.) fused to another peptide or 
polypeptide; or (e) therapeutic or diagnostic derivatives of 
the described polynucleotides such as oligonucleotides, anti- 

45 sense polynucleotides, ribozymes, dsRNA, or gene therapy 
constructs comprising a sequence first disclosed in the 
Sequence Listing. 

As discussed above, the present invention includes: (a) 
the human DNA sequences presented in the Sequence List- 

50 ing (and vectors comprising the same) and additionally 
contemplates any nucleotide sequence encoding a contigu- 
ous NHP open reading frame (ORF), or a contiguous exon 
splice junction first described in the Sequence Listing, that 
hybridizes to a complement of a DNA sequence presented in 

55 the Sequence Listing under highly stringent conditions, e.g., 
hybridization to filter-bound DNA in 0.5 M NaHP0 4 , 1% 
sodium dodecyl sulfate (SDS), 1 mM EDTA at 65° C, and 
washing in 0.1xSSC/0.1% SDS at 68° C. (Ausubel F M. et 
al., eds., 1989, Current Protocols in Molecular Biology, Vol. 

60 I, Green Publishing Associates, Inc., and John Wiley & sons, 
Inc., New York, at p. 2.10.3) and encodes a functionally 
equivalent gene product. Additionally contemplated are any 
nucleotide sequences that hybridize to the complement of 
the DNA sequence that encode and express an amino acid 

65 sequence presented in the Sequence Listing under moder- 
ately stringent conditions, e.g., washing in 0.2xSSC/0.1% 
SDS at 42° C. (Ausubel et al., 1989, supra), yet still encode 
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a functionally equivalent NHP product. Functional equiva- 
lents of a NHP include naturally occurring NHPs present in 
other species and mutant NHPs whether naturally occurring 
or engineered (by site directed mutagenesis, gene shuffling, 
directed evolution as described in, for example, U.S. Pat. 
No. 5,837,458). The invention also includes degenerate 
nucleic acid variants of the disclosed NHP polynucleotide 
sequences. 

Additionally contemplated are polynucleotides encoding 
a NHP ORF, or its functional equivalent, encoded by a 
polynucleotide sequence that is about 99, 95, 90, or about 85 
percent similar or identical to corresponding regions of the 
nucleotide sequences of the Sequence Listing (as measured 
by BLAST sequence comparison analysis using, for 
example, the GCG sequence analysis package using stan- 
dard default settings). 

The invention also includes nucleic acid molecules, pref- 
erably DNA molecules, that hybridize to, and are therefore 
the complements of, the described NHP gene nucleotide 
sequences. Such hybridization conditions may be highly 
stringent or less highly stringent, as described above. In 
instances where the nucleic acid molecules are deoxyoligo- 
nucleo tides ("DNA oligos"), such molecules are generally 
about 16 to about 100 bases long, or about 20 to about 80, 
or about 34 to about 45 bases long, or any variation or 
combination of sizes represented therein that incorporate a 
contiguous region of sequence first disclosed in the 
Sequence Listing. Such oligonucleotides can be used in 
conjunction with the polymerase chain reaction (PCR) to 
screen libraries, isolate clones, and prepare cloning and 
sequencing templates, etc. 

Alternatively, such NHP oligonucleotides can be used as 
hybridization probes for screening libraries, and assessing 
gene expression patterns (particularly using a micro array or 
high-throughput "chip" format). Additionally, a series of the 
described NHP oligonucleotide sequences, or the comple- 
ments thereof, can be used to represent all or a portion of the 
described NHP sequences. The oligonucleotides, typically 
between about 16 to about 40 (or any whole number within 
the stated range) nucleotides in length may partially overlap 
each other and/or a NHP sequence may be represented using 
oligonucleotides that do not overlap. Accordingly, the 
described NHP polynucleotide sequences shall typically 
comprise at least about two or three distinct oligonucleotide 
sequences of at least about 18, and preferably about 25, 
nucleotides in length that are each first disclosed in the 
described Sequence Listing. Such oligonucleotide 
sequences may begin at any nucleotide present within a 
sequence in the Sequence Listing and proceed in either a 
sense (5'-to-3') orientation vis-a-vis the described sequence 
or in an antisense orientation. 

For oligonucleotide probes, highly stringent conditions 
may refer, e.g., to washing in 6xSSC/0.05% sodium pyro- 
phosphate at 37° C. (for 14-base oligos), 48° C. (for 17-base 
oligos), 55° C. (for 20-base oligos), and 60° C. (for 23-base 
oligos). These nucleic acid molecules may encode or act as 
NHP gene antisense molecules, useful, for example, in NHP 
gene regulation (for and/or as antisense primers in amplifi- 
cation reactions of NHP gene nucleic acid sequences). With 
respect to NHP gene regulation, such techniques can be used 
to regulate biological functions. Further, such sequences 
may be used as part of ribozyme and/or triple helix 
sequences that are also useful for NHP gene regulation. 

Inhibitory antisense or double stranded oligonucleotides 
can additionally comprise at least one modified base moiety 
which is selected from the group including but not limited to 



5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, 
hypoxanthine, xantine, 4- acety Icy tosine , 
5-(carboxyhydroxylmethyl) uracil, 
5-carboxymethylaminomethyl-2-thiouridine, 
5 5-carboxymethylaminomethyluracil, dihydrouracil, beta-D- 
galactosylqueosine, inosine, N6-isopentenyladenine, 

1- methylguanine, 1-methylinosine, 2,2-dimethylguanine, 

2- methyladenine, 2-methylguanine, 3-methylcytosine, 
5-methylcytosine, N6-adenine, 7-methylguanine, 

to 5-methylaminomethyluracil, 5-methoxyaminomethyl2- 
thiouracil, beta-D-mannosylqueosine, 
S'-methoxycarboxymethyluracil, 5-methoxyuracil, 
2-methylthio-N6-isopentenyladenine, uracil -5-oxyacetic 
acid (v), wybutoxosine, pseudouracil, queosine, 

15 2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 
4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid 
methylester, uracil-5-oxyacetic acid (v), 5-methyl2- 
thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3) 
w, and 2,6-diaminopurine. 

20 The antisense oligonucleotide can also comprise at least 
one modified sugar moiety selected from the group includ- 
ing but not limited to arabinose, 2-fluoroarabinose, xylulose, 
and hexose. 

In yet another embodiment, the antisense oligonucleotide 
25 will comprise at least one modified phosphate backbone 
selected from the group consisting of a phosphorothioate, a 
phosphorodithioate, a phosphoramidothioate, a 
phosphoramidate, a phosphordi amidate, a 
methylphosphonate, an alkyl phosphotriester, and a formac- 
etal or analog thereof. In yet another embodiment, the 
antisense oligonucleotide is an a-anomeric oligonucleotide. 
An a-anomeric oligonucleotide forms specific double- 
stranded hybrids with complementary RNA in which, con- 
trary to the usual (3 -units, the strands run parallel to each 
other (Gautier et al., 1987, Nucl. Acids Res. 15:6625-6641). 
The oligonucleotide is a 2'-0-methylribonucleotide (Inoue 
et al., 1987, Nucl. Acids Res. 15:6131-6148), or a chimeric 
RNA-DNA analogue (Inoue et al., 1987, FEBS Lett. 
215:327-330). Alternatively, double stranded RNA can be 
40 used to disrupt the expression and function of a targeted 
NHP. 

Oligonucleotides of the invention can be synthesized by 
standard methods known in the art, e.g. by use of an 

45 automated DNA synthesizer (such as are commercially 
available from Biosearch, Applied Biosystems, etc.). As 
examples, phosphorothioate oligonucleotides can be synthe- 
sized by the method of Stein et al. (1988, Nucl. Acids Res. 
16:3209), and methylphosphonate oligonucleotides can be 

50 prepared by use of controlled pore glass polymer supports 
(Sarin et al., 1988, Proc. Natl. Acad. Sci. U.S.A. 
85:7448-7451), etc. 

Low stringency conditions are well known to those of 
skill in the art, and will vary predictably depending on the 

55 specific organisms from which the library and the labeled 
sequences are derived. For guidance regarding such condi- 
tions see, for example, Sambrook et al., 1989, Molecular 
Cloning, A Laboratory Manual (and periodic updates 
thereof), Cold Springs Harbor Press, N.Y.; and Ausubel et 

60 al., 1989, Current Protocols in Molecular Biology, Green 
Publishing Associates and Wiley Interscience, N.Y. 

Alternatively, suitably labeled NHP nucleotide probes can 
be used to screen a human genomic library using appropri- 
ately stringent conditions or by PCR. The identification and 

65 characterization of human genomic clones is helpful for 
identifying polymorphisms (including, but not limited to, 
nucleotide repeats, micro satellite alleles, single nucleotide 
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polymorphisms, or coding single nucleotide corresponding mutant NHP allele in such libraries. Clones 

polymorphisms), determining the genomic structure of a containing mutant NHP gene sequences can then be purified 

given locus/allele, and designing diagnostic tests. For and subjected to sequence analysis according to methods 

example, sequences derived from regions adjacent to the well known to those skilled in the art. 

intron/exon boundaries of the human gene can be used to 5 Additionally, an expression library can be constructed 

design primers for use in amplification assays to detect utilizing cDNA synthesized from, for example, RNA iso- 

mutations within the exons, introns, splice sites (e.g., splice lated from a tissue known, or suspected, to express a mutant 

acceptor and/or donor sites), etc., that can be used in NHP allele in an individual suspected of or known to carry 

diagnostics and pharmacogenomics. such a mutant allele. In this manner, gene products made by 

Further, a NHP homolog can be isolated from nucleic acid ]0 the putative ly mutant tissue can be expressed and screened 

from an organism of interest by performing PCR using two using standard antibody screening techniques in conjunction 

degenerate or "wobble" oligonucleotide primer pools with antibodies raised against normal NHP product, as 

designed on the basis of amino acid sequences within the described below. (For screening techniques, see, for 

NHP products disclosed herein. The template for the reac- example, Harlow, E. and Lane, eds., 1988, "Antibodies: A 

tion may be total RNA, mRNA, and/or cDNA obtained by 15 Laboratory Manual", Cold Spring Harbor Press, Cold Spring 

reverse transcription of mRNA prepared from human or Harbor.) Additionally, screening can be accomplished by 

nonhuman cell lines or tissue known or suspected to express screening with labeled NHP fusion proteins, such as, for 

an allele of a NHP gene. The PCR product can be subcloned example, AP-NHP or NHP-AP fusion proteins. In cases 

and sequenced to ensure that the amplified sequences rep- where a NHP mutation results in an expressed gene product 

resent the sequence of the desired NHP gene. The PCR 20 with altered function (e.g., as a result of a missense or a 

fragment can then be used to isolate a full length cDNA frameshift mutation), polyclonal antibodies to NHP are 

clone by a variety of methods. For example, the amplified likely to cross-react with a corresponding mutant NHP gene 

fragment can be labeled and used to screen a cDNA library, product. Library clones detected via their reaction with such 

such as a bacteriophage cDNA library. Alternatively, the labeled antibodies can be purified and subjected to sequence 

labeled fragment can be used to isolate genomic clones via 25 analysis according to methods well known in the art. 

the screening of a genomic library. The invention also encompasses (a) DNA vectors that 

PCR technology can also be used to isolate full length contain any of the foregoing NHP coding sequences and/or 

cDNA sequences. For example, RNA can be isolated, fol- their complements (i.e., antisense); (b) DNA expression 

lowing standard procedures, from an appropriate cellular or vectors that contain any of the foregoing NHP coding 

tissue source (i.e., one known, or suspected, to express a 30 sequences operatively associated with a regulatory element 

NHP gene, such as, for example, testis tissue). A reverse that directs the expression of the coding sequences (for 

transcription (RT) reaction can be performed on the RNA example, baculo virus as described in U.S. Pat. No. 5,869, 

using an oligonucleotide primer specific for the most 5' end 336 herein incorporated by reference); (c) genetically engi- 

of the amplified fragment for the priming of first strand neered host cells that contain any of the foregoing NHP 

synthesis. The resulting RNA/DNA hybrid may then be 35 coding sequences operatively associated with a regulatory 

"tailed" using a standard terminal transferase reaction, the element that directs the expression of the coding sequences 

hybrid may be digested with RNase H, and second strand in the host cell; and (d) genetically engineered host cells that 

synthesis may then be primed with a complementary primer. express an endogenous NHP gene under the control of an 

Thus, cDNA sequences upstream of the amplified fragment exogenously introduced regulatory element (i.e., gene 

can be isolated. For a review of cloning strategies that can 40 activation) or genetically engineered transcription factor As 

be used, see e.g., Sambrook et al., 1989, supra. used herein, regulatory elements include but are not limited 

A cDNA encoding a mutant NHP gene can be isolated, for to inducible and non-inducible promoters, enhancers, opera- 
example, by using PCR. In this case, the first cDNA strand tors and other elements known to those skilled in the art that 
may be synthesized by hybridizing an oligo-dT oligonuclc- drive and regulate expression. Such regulatory elements 
otide to mRNA isolated from tissue known or suspected to 45 include but are not limited to the cytomegalovirus hCMV 
be expressed in an individual putative ly carrying a mutant immediate early gene, regulatable, viral (particularly retro- 
NHP allele, and by extending the new strand with reverse viral LTR promoters) the early or late promoters of SV40 
transcriptase. The second strand of the cDNA is then syn- adenovirus, the lac system, the trp system, the TAC system, 
thesized using an oligonucleotide that hybridizes specifi- the TRC system, the major operator and promoter regions of 
cally to the 5' end of the normal gene. Using these two 50 phage lambda, the control regions of fd coat protein, the 
primers, the product is then amplified via PCR, optionally promoter for 3-phosphogly cerate kinase (PGK), the promot- 
cloned into a suitable vector, and subjected to DNA ers of acid phosphatase, and the promoters of the yeast 
sequence analysis through methods well known to those of a-mating factors. 

skill in the art. By comparing the DNA sequence of the The present invention also encompasses antibodies and 

mutant NHP allele to that of a corresponding normal NHP 55 anti- idiotypic antibodies (including Fab fragments), antago- 

allele, the mutation(s) responsible for the loss or alteration nists and agonists of a NHP, as well as compounds or 

of function of the mutant NHP gene product can be ascer- nucleotide constructs that inhibit expression of a NHP gene 

tained. (transcription factor inhibitors, antisense and ribozyme 

Alternatively, a genomic library can be constructed using molecules, or gene or regulatory sequence replacement 

DNA obtained from an individual suspected of or known to 60 constructs), or promote the expression of a NHP (e.g., 

carry a mutant NHP allele (e.g., a person manifesting a expression constructs in which NHP coding sequences are 

NHP-associated phenotype such as, for example, obesity, operatively associated with expression control elements 

high blood pressure, connective tissue disorders, infertility, such as promoters, promoter/enhancers, etc.). 

etc.), or a cDNA library can be constructed using RNA from The NHPs or NHP peptides, NHP fusion proteins, NHP 

a tissue known, or suspected, to express a mutant NHP 65 nucleotide sequences, antibodies, antagonists and agonists 

allele. A normal NHP gene, or any suitable fragment thereof, can be useful for the detection of mutant NHPs or inappro- 

can then be labeled and used as a probe to identify the priately expressed NHPs for the diagnosis of disease. The 
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NHPs or NHP peptides, NHP fusion proteins, NHP nucle- 
otide sequences, host cell expression systems, antibodies, 
antagonists, agonists and genetically engineered cells and 
animals can be used for screening for drugs (or high 
throughput screening of combinatorial libraries) effective in 5 
the treatment of the symptomatic or phenotypic manifesta- 
tions of perturbing the normal function of NHP in the body. 
The use of engineered host cells and/or animals may offer an 
advantage in that such systems allow not only for the 
identification of compounds that bind to the endogenous 10 
receptor for a NHP, but can also identify compounds that 
trigger NHP-mediated activities or pathways. 

Finally, the NHP products can be used as therapeutics. For 
example, soluble derivatives such as NHP peptides/domains 
corresponding to NHP, NHP fusion protein products 15 
(especially NHP-Ig fusion proteins, i.e., fusions of a NHP, or 
a domain of a NHP, to an IgFc), NHP antibodies and 
anti-idiotypic antibodies (including Fab fragments), antago- 
nists or agonists (including compounds that modulate or act 
on downstream targets in a NHP-mediated pathway) can be 20 
used to directly treat diseases or disorders. For instance, the 
administration of an effective amount of soluble NHP, or a 
NHP-IgFc fusion protein or an anti-idiotypic antibody (or its 
Fab) that mimics a NHP could activate or effectively antago- 
nize the endogenous NHP receptor. Nucleotide constructs 25 
encoding such NHP products can be used to genetically 
engineer host cells to express such products in vivo; these 
genetically engineered cells function as "bioreactors" in the 
body delivering a continuous supply of a NHP, a NHP 
peptide, or a NHP fusion protein to the body. Nucleotide 30 
constructs encoding functional NHP, mutant NHPs, as well 
as antisense and ribozyme molecules can also be used in 
"gene therapy" approaches for the modulation of NHP 
expression. Thus, the invention also encompasses pharma- 
ceutical formulations and methods for treating biological 35 
disorders. 

Various aspects of the invention are described in greater 
detail in the subsections below. 



5.1 THE NHP SEQUENCES 



40 



The cDNA sequences and corresponding deduced amino 
acid sequences of the described NHPs are presented in the 
Sequence Listing. SEQ ID NO:5 describes a NHP ORF as 
well as flanking regions. The NHP nucleotides were 45 
obtained from human cDNA libraries using probes and/or 
primers generated from human genomic sequence. Expres- 
sion analysis has provided evidence that the described NHP 
can be expressed a variety of human cells. 



5.2 NHPS AND NHP POLYPEPTIDES 



50 



The NHPs, NHP polypeptides, NHP peptide fragments, 
mutated, truncated, or deleted forms of NHP, and/or NHP 
fusion proteins can be prepared for a variety of uses, 
including but not limited to the generation of antibodies, as 55 
reagents in diagnostic assays, the identification of other 
cellular gene products related to a NHP, as reagents in assays 
for screening for compounds that can be used as pharma- 
ceutical reagents useful in the therapeutic treatment of 
mental, biological, or medical disorders and disease. The go 
described NHPs share similarity with a variety of proteases, 
including proteases having thrombospondin repeats, 
disintegrins, aggrecanases, and metalloproteinases 
(especially zinc metalloproteases of the ADAMTS family). 

The Sequence Listing discloses the amino acid sequences 65 
encoded by the described NHP polynucleotides. The NHPs 
display an initiator methionines in DNA sequence contexts 



consistent with a translation initiation site, and several of the 
ORFs display a signal -like sequence which can indicate that 
the described NHP ORFs are secreted proteins or can be 
membrane associated. 

The NHP amino acid sequences of the invention include 
the amino acid sequences presented in the Sequence Listing 
as well as analogues and derivatives thereof. Further, cor- 
responding NHP homologues from other species are encom- 
passed by the invention. In fact, any NHPs encoded by a 
NHP nucleotide sequence described above are within the 
scope of the invention, as are any novel polynucleotide 
sequences encoding all or any novel portion of an amino 
acid sequence presented in the Sequence Listing. The degen- 
erate nature of the genetic code is well known, and, 
accordingly, each amino acid presented in the Sequence 
Listing, is generically representative of the well known 
nucleic acid "triplet" codon, or in many cases codons, that 
can encode the amino acid. As such, as contemplated herein, 
the amino acid sequences presented in the Sequence Listing, 
when taken together with the genetic code (see, for example, 
Table 4-1 at page 109 of "Molecular Cell Biolog", 1986, J. 
Darnell et al. eds., Scientific American Books, New York, 
N.Y, herein incorporated by reference) are generically rep- 
resentative of all the various permutations and combinations 
of nucleic acid sequences that can encode such amino acid 
sequences. 

The invention also encompasses proteins that are func- 
tionally equivalent to the NHPs encoded by the presently 
described nucleotide sequences as judged by any of a 
number of criteria, including, but not limited to, the ability 
to bind and cleave a substrate of a NHP, or the ability to 
effect an identical or complementary downstream pathway, 
or a change in cellular metabolism (e.g., proteolytic activity, 
ion flux, tyrosine phosphorylation, etc.). Such functionally 
equivalent NHP proteins include, but are not limited to, 
additions or substitutions of amino acid residues within the 
amino acid sequence encoded by the NHP nucleotide 
sequences described above, but which result in a silent 
change, thus producing a functionally equivalent gene prod- 
uct. Amino acid substitutions can be made on the basis of 
similarity in polarity, charge, solubility, hydrophobicity, 
hydrophilicity, and/or the amphipathic nature of the residues 
involved. For example, nonpolar (hydrophobic) amino acids 
include alanine, leucine, isoleucine, valine, proline, 
phenylalanine, tryptophan, and methionine; polar neutral 
amino acids include glycine, serine, threonine, cysteine, 
tyrosine, asparagine, and glutamine; positively charged 
(basic) amino acids include arginine, lysine, and histidine; 
and negatively charged (acidic) amino acids include aspartic 
acid and glutamic acid. 

A variety of host-expression vector systems can be used 
to express the NHP nucleotide sequences of the invention. 
Where, as in the present instance, a NHP peptide or NHP 
polypeptide is thought to be a soluble or secreted molecule, 
the peptide or polypeptide can be recovered from the culture 
media. Such expression systems also encompass engineered 
host cells that express NHP, or functional equivalent, in situ. 
Purification or enrichment of a NHP from such expression 
systems can be accomplished using appropriate detergents 
and lipid micelles and methods well known to those skilled 
in the art. However, such engineered host cells themselves 
may be used in situations where it is important not only to 
retain the structural and functional characteristics of a NHP, 
but to assess biological activity, e.g., in drug screening 
assays. 

The expression systems that may be used for purposes of 
the invention include but are not limited to microorganisms 
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such as bacteria (e.g., E. coli, B. subtilis) transformed with NHP product in infected hosts (e.g., See Logan & Shenk, 
recombinant bacteriophage DNA, plasmid DNA or cosmid 1984, Proc. Natl. Acad. Sci. USA 81:3655-3659). Specific 
DNA expression vectors containing NHP nucleotide initiation signals may also be required for efficient transla- 
sequences; yeast (e.g., Saccharomyces, Pichia) transformed tion of inserted NHP nucleotide sequences. These signals 
with recombinant yeast expression vectors containing NHP 5 include the ATG initiation codon and adjacent sequences. In 
encoding nucleotide sequences; insect cell systems infected cases where an entire NHP gene or cDNA, including its own 
with recombinant virus expression vectors (e.g., initiation codon and adjacent sequences, is inserted into the 
baculovirus) containing NHP sequences; plant cell systems appropriate expression vector, no additional translational 
infected with recombinant virus expression vectors (e.g., control signals may be needed. However, in cases where 
cauliflower mosaic virus, CaMV; tobacco mosaic virus, 10 only a portion of a NHP coding sequence is inserted, 
TMV) or transformed with recombinant plasmid expression exogenous translational control signals, including, perhaps, 
vectors (e.g., Ti plasmid) containing NHP nucleotide the ATG initiation codon, must be provided. Furthermore, 
sequences; or mammalian cell systems (e.g., COS, CHO, the initiation codon must be in phase with the reading frame 
BHK, 293, 3T3) harboring recombinant expression con- of the desired coding sequence to ensure translation of the 
structs containing promoters derived from the genome of 15 entire insert. These exogenous translational control signals 
mammalian cells (e.g., metallothionein promoter) or from and initiation codons can be of a variety of origins, both 
mammalian viruses (e.g., the adenovirus late promoter; the natural and synthetic. The efficiency of expression may be 
vaccinia virus 7.5 K promoter). enhanced by the inclusion of appropriate transcription 
In bacterial systems, a number of expression vectors may enhancer elements, transcription terminators, etc. (See Bitter 
be advantageously selected depending upon the use intended 2 o et 1987, Methods in Enzymol. 153:516-544). 
for the NHP product being expressed. For example, when a In addition, a host cell strain may be chosen that modu- 
large quantity of such a protein is to be produced for the lates the expression of the inserted sequences, or modifies 
generation of pharmaceutical compositions of and/or con- and processes the gene product in the specific fashion 
taining a NHP, or for raising antibodies to a NHP, vectors desired. Such modifications (e.g., glycosylation) and pro- 
that direct the expression of high levels of fusion protein 2 5 cessing (e.g., cleavage) of protein products may be impor- 
products that are readily purified may be desirable. Such tant for the function of the protein. Different host cells have 
vectors include, but are not limited, to the E. coli expression characteristic and specific mechanisms for the post- 
vector pUR278 (Ruther et al., 1983, EMBO J. 2:1791), in translational processing and modification of proteins and 
which a NHP coding sequence may be ligated individually gene products. Appropriate cell lines or host systems can be 
into the vector in frame with the lacZ coding region so that 30 chosen to ensure the correct modification and processing of 
a fusion protein is produced; pIN vectors (Inouye & Inouye, the foreign protein expressed. To this end, eukaryotic host 
1985, Nucleic Acids Res. 13:3101-3109; Van Heeke & cells which possess the cellular machinery for proper pro- 
Schuster, 1989, J. Biol. Chem. 264:5503-5509); and the cessing of the primary transcript, glycosylation, and phos- 
like. pGEX vectors (Pharmacia or American Type Culture phorylation of the gene product may be used. Such mam- 
Collection) can also be used to express foreign polypeptides 35 malian host cells include, but are not limited to, CHO, 
as fusion proteins with glutathione S-transferase (GST). In VERO, BHK, HeLa, COS, MDCK, 293, 3T3, WI38, and in 
general, such fusion proteins are soluble and can easily be particular, human cell lines. 

purified from lysed cells by adsorption to glutathione- For long-term, high-yield production of recombinant 

agarose beads followed by elution in the presence of free proteins, stable expression is preferred. For example, cell 

glutathione. The PGEX vectors are designed to include 40 lines which stably express the NHP sequences described 

thrombin or factor Xa protease cleavage sites so that the above can be engineered. Rather than using expression 

cloned target gene product can be released from the GST vectors which contain viral origins of replication, host cells 

moiety. can be transformed with DNA controlled by appropriate 

In an insect system, Autographa calif ornica nuclear poly- expression control elements (e.g., promoter, enhancer 

hidrosis virus (AcNPV) is used as a vector to express foreign 45 sequences, transcription terminators, polyadenylation sites, 

genes. The virus grows in Spodoptera frugiperda cells. A etc.), and a selectable marker. Following the introduction of 

NHP gene coding sequence can be cloned individually into the foreign DNA, engineered cells may be allowed to grow 

non-essential regions (for example the polyhedrin gene) of for 1-2 days in an enriched media, and then are switched to 

the virus and placed under control of an AcNPV promoter a selective media. The selectable marker in the recombinant 

(for example the polyhedrin promoter). Successful insertion 50 plasmid confers resistance to the selection and allows cells 

of NHP gene coding sequence will result in inactivation of to stably integrate the plasmid into their chromosomes and 

the polyhedrin gene and production of non-occluded recom- grow to form foci which in turn can be cloned and expanded 

binant virus (i.e., virus lacking the proteinaceous coat coded into cell lines. This method may advantageously be used to 

for by the polyhedrin gene). These recombinant viruses are engineer cell lines which express a NHP product. Such 

then used to infect Spodoptera frugiperda cells in which the 55 engineered cell lines may be particularly useful in screening 

inserted gene is expressed (e.g., see Smith et al., 1983, J. and evaluation of compounds that affect the endogenous 

Virol. 46:584; Smith, U.S. Pat. No. 4,215,051). activity of a NHP product. 

In mammalian host cells, a number of viral-based expres- A number of selection systems may be used, including but 

sion systems may be utilized. In cases where an adenovirus not limited to the herpes simplex virus thymidine kinase 

is used as an expression vector, the NHP nucleotide 60 (Wigler, et al., 1977, Cell 11:223), hypoxanthineguanine 

sequence of interest may be ligated to an adenovirus phosphoribosyl transferase (Szybalska & Szybalski, 1962, 

transcription/translation control complex, e.g., the late pro- Proc. Natl. Acad. Sci. USA 48:2026), and adenine phospho- 

moter and tripartite leader sequence. This chimeric gene ribosyltransferase (Lowy, et al., 1980, Cell 22:817) genes 

may then be inserted in the adenovirus genome by in vitro can be employed in tk", hgprt" or aprt" cells, respectively, 

or in vivo recombination. Insertion in a non-essential region 65 Also, antimetabolite resistance can be used as the basis of 

of the viral genome (e.g., region El or E3) will result in a selection for the following genes: dhfr, which confers resis- 

recombinant virus that is viable and capable of expressing a tance to methotrexate (Wigler, et al., 1980, Natl. Acad. Sci. 
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USA 77:3567; O'Hare, et al., 1981, Proc. Natl. Acad. Sci. any technique which provides for the production of antibody 

USA 78:1527); gpt, which confers resistance to mycophe- molecules by continuous cell lines in culture. These include, 

nolic acid (Mulligan & Berg, 1981, Proc. Natl. Acad. Sci. but are not limited to, the hybridoma technique of Kohler 

USA 78:2072); neo, which confers resistance to the ami- and Milstein, (1975, Nature 256:495-497; and U.S. Pat. No. 

noglycoside G-418 (Colberre-Garapin, et al., 1981, J. Mol 5 4,376,110), the human B-cell hybridoma technique (Kosbor 

Biol. 150:1); and hygro, which confers resistance to hygro- et al., 1983, Immunology Today 4:72; Cole et al, 1983, 

mycin (Santerre, et al., 1984, Gene 30:147). Proc. Natl. Acad. Sci. USA 80:2026-2030), and the EBV- 

Alternatively, any fusion protein can be readily purified hybridoma technique (Cole et al., 1985, Monoclonal Anti- 

by utilizing an antibody specific for the fusion protein being bodies ^ Cancer Therapy, M ™ R Liss, Inc., pp. 77-96). 

expressed. For example, a system described by Janknechtet 10 Su( T h J . antl T bo / ? ie T s ll m ^ r , b f ? f T a ^ immuno^obulin class 

al. allows for the ready purification of non-denatured fusion ^^is 0 ' IgM ' ^ ^ l ^ n f f^. subclass *«cof. 

proteins expressed in human cell lines (Janknecht, et al., hybridoma producing the mAb of this invention may be 

inm n_ xt ,i * j o • ttoa oo onn^ » cultivated in vitro or in vivo. Production of high titers of 

1991, Proc. Natl. Acad. Sci. USA 88:8972-8976). In this „ A . • 1 4 . ■ , u .1 r J f t , * c 

# ' . c - 4 A . . , , - * . mAbs in vivo makes this the presently preferred method of 

system, the gene or interest is subcloned into a vaccinia production 

recombination plasmid such that the gene's open reading « ,„ addi , fo techniques deve loped for the production of 

frame is translationally fused to an ammo-terminal tag .< chimeric antibodies" (Morrison et al., 1984, Proc. 

consistuig of six histidme residues. Extracts from cells N>fl _ Acad sd 81 . M51 ^ 855 Neuberger et ^ ig84> 

infected with recombinant vaccinia virus are loaded onto SM 3 i 2 :604-608; Takeda et al., 1985, Nature, 

ni mtmoacetic acid-agarose columns and nistidine-tagged 3 i 4: 452^54) by splicing the genes from a mouse antibody 

proteins are selectively eluted with imidazole-contaimng 20 mo lecule of appropriate antigen specificity together with 

buffers. genes from a human antibody molecule of appropriate 

5.3 ANTIBODIES TO NHP PRODUCTS biological activity can be used. A chimeric antibody is a 

A . „ • molecule in which different portions are derived from dif- 

Antibodies that specifically recognize one or more ferent animal species, such as those having a variable region 

epitopes of a NHP, or epitopes of conserved variants of a 25 derived from a murine mAb and a human irnrnunog i obll i in 

NHP, or peptide fragments of a NHP are also encompassed constant region. 

by the invention. Such antibodies include but are not limited Alternatively! techniques described for the production of 

to polyclonal antibodies, monoclonal antibodies (mAbs), s j ngle chain antibodies (U.S. Pat. No. 4,946,778; Bird, 1988, 

humanized or chimeric antibodies, single chain antibodies, Science 242:423^26; Huston et al., 1988, Proc. Natl. Acad. 

Fab fragments, F(ab) 2 fragments, fragments produced by a 30 Sci. USA 85:5879-5883; and Ward et al., 1989, Nature 

Fab expression library, anti-idiotypic (anti-Id) antibodies, 341:544-546) can be adapted to produce single chain anti- 

and epitope -binding fragments of any of the above. bodies against NHP gene products. Single chain antibodies 

The antibodies of the invention may be used, for example, are formed by linking the heavy and light chain fragments of 

in the detection of a NHP in a biological sample and may, l he Fv region via an amino acid bridge, resulting in a single 

therefore, be utilized as part of a diagnostic or prognostic 35 cna i° polypeptide. 

technique whereby patients may be tested for abnormal Antibody fragments which recognize specific epitopes 

amounts of NHR Such antibodies may also be utilized in mav be generated by known techniques. For example, such 

conjunction with, for example, compound screening fragments include, but are not limited to: the F(ab') 2 frag- 

schemes for the evaluation of the effect of test compounds ments which can be produced by pepsin digestion of the 

on expression and/or activity of a NHP gene product. 40 antlbod y molecule and the Fab fragments which can be 

Additionally, such antibodies can be used in conjunction f nera *d by reducmg the disulfide bridges of the F(ab') 2 

gene therapy to, for example, evaluate the normal and/or fronts. Alternatively, Fab expression hbranes may be 

° • q _/v T tt D • n . constructed (Huse et al., 1989, Science, 246:127-1281) to 

fnT^-^T^H^^I,?.^.."^"^ 11 allow «PM ■*! easy identification of monoclonal Fab 

* S r i" Suc ti h K a f tlbod p les K may a ^° D nally . be aS fragment with the desired specificity, 

a metnoa tor tne intutmion ot abnormal NHP activity IHus, 45 ^^ies to a NHP can, in turn, be utilized to generate 

such antibodies may, therefore, be utilized as part of treat- j- . ... j- *i_ * « • • » • *tt T t» 

, , J ' * F anti-idiotype antibodies that "mimic" a given NHP, using 

ment metnods. techniques well known to those skilled in the art. (See, e.g., 

For the production of antibodies, various host animals Greenspan & Bona, 1993, FASEB J. 7(5):437-444; and 

may be immunized by injection with a NHP, an NHP peptide Nissinoff, 1991, J. Immunol. 147(8): 2429-2438). For 

(e.g., one corresponding to a functional domain of a NHP), 50 example antibodies which bind to a NHP domain and 

truncated NHP polypeptides (NHP in which one or more competitively inhibit the binding of NHP to its cognate 

domains have been deleted), functional equivalents of a receptor can be used to generate anti-idiotypes that "mimic" 

NHP or mutated variants of a NHP. Such host animals may a NHP and, therefore, bind and activate or neutralize a 

include but are not limited to pigs, rabbits, mice, goats, and receptor. Such anti-idiotypic antibodies or Fab fragments of 

rats, to name but a few. Various adjuvants may be used to 55 such anti-idiotypes can be used in therapeutic regimens 

increase the immunological response, depending on the host involving a NHP signaling pathway, 

species, including but not limited to Freund's (complete and The present invention is not to be limited in scope by the 

incomplete), mineral gels such as aluminum hydroxide, specific embodiments described herein, which are intended 

surface active substances such as lysolecithin, pluronic as single illustrations of individual aspects of the invention, 

polyols, polyanions, peptides, oil emulsions, keyhole limpet 60 and functionally equivalent methods and components are 

hemocyanin, dinitrophenol, and potentially useful human within the scope of the invention. Indeed, various modifi- 

adjuvants such as BCG (bacille Calmette-Guerin) and cations of the invention, in addition to those shown and 

Corynebacterium parvum. Polyclonal antibodies are hetero- described herein will become apparent to those skilled in the 

geneous populations of antibody molecules derived from the art from the foregoing description. Such modifications are 

sera of the immunized animals. 65 intended to fall within the scope of the appended claims. All 

Monoclonal antibodies, which are homogeneous popula- cited publications, patents, and patent applications are herein 

tions of antibodies to a particular antigen, can be obtained by incorporated by reference in their entirety. 
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SEQUENCE LISTING 

<160> NUMBER OF SEQ ID NOS: 5 

<210> SEQ ID NO 1 

<211> LENGTH: 1476 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 1 



atgaagcccc 


gcgcgcgcgg 


atggcggggc 


ttggcggcgc tgtggatgct gttggcgcag 


60 


gtggccgagc 


aggcacctgc 


gtgcgccatg 


ggacccgcag cggcagcgcc tgggagcccg 


120 


agcgtcccgc 


gtcctcctcc 


acccgcggag 


cggccgggct ggatggaaaa gggcgaatat 


180 


gacctggtct 


ctgcctacga 


ggttgaccac 


aggggcgatt acgtgtccca tgaaatcatg 


240 


caccatcagc 


ggcggagaag 


agcagtggcc 


gtgtccgagg ttgagtctct tcaccttcgg 


300 


ctgaaaggct 


ccaggcacga 


cttccacgtg 


gatctgagga cttccagcag cctagtggct 


360 


cctggcttta 


ttgtgcagac 


gttgggaaag 


acaggcacta agtctgtgca gactttaccg 


420 


ccagaggact 


tctgtttcta 


tcaaggctct 


ttgcgatcac acagaaactc ctcagtggcc 


480 


ctttcaacct 


gccaaggctt 


gtcaggcatg 


atacgaacag aagaggcaga ttacttccta 


540 


aggccacttc 


cttcacacct 


ctcatggaaa 


ctcggcagag ctgcccaagg cagctcgcca 


600 


tcccacgtac 


tgtacaagag 


atccacagag 


ccccatgctc ctggggccag tgaggtcctg 


660 


gtgacctcaa 


ggacatggga 


gctggcacat 


caacccctgc acagcagcga ccttcgcctg 


720 


ggactgccac 


aaaagcagca 


tttctgtgga 


agacgcaaga aatacatgcc ccagcctccc 


780 


aaggaagacc 


tcttcatctt 


gccagatgag 


tataagtctt gcttacggca taagcgctct 


840 


cttctgaggt 


cccatagaaa 


tgaagaactg 


aacgtggaga ccttggtggt ggtcgacaaa 


900 


aagatgatgc 


aaaaccatgg 


ccatgaaaat 


atcaccacct acgtgctcac gatactcaac 


960 


atggtatctg 


ctttattcaa 


agatggaaca 


ataggaggaa acatcaacat tgcaattgta 


1020 


ggtctgattc 


ttctagaaga 


tgaacagcca 


ggactggtga taagtcacca cgcagaccac 


1080 


accttaagta 


gcttctgcca 


gtggcagtct 


ggattgatgg ggaaagatgg gactcgtcat 


1140 


gaccacgcca 


tcttactgac 


tggtctggat 


atatgttcct ggaagaatga gccctgtgac 


1200 


actttgggat 


ttgcacccat 


aagtggaatg 


tgtagtaaat atcgcagctg cacgattaat 


1260 


gaagatacag 


gtcttggact 


ggccttcacc 


attgcccatg agtctggaca caactttggc 


1320 


atgattcatg 


atggagaagg 


gaacatgtgt 


aaaaagtccg agggcaacat catgtcccct 


1380 


acattggcag 


gacgcaatgg 


agtcttctcc 


tggtcaccct gcagccgcca gtatctacac 


1440 


aaatttctaa 


gatcagtgaa 


aatgccagct 


ctctga 


1476 



<210> SEQ ID NO 2 

<211> LENGTH: 491 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<4 00> SEQUENCE: 2 

Met Lys Pro Arg Ala Arg Gly Trp Arg Gly Leu Ala Ala Leu Trp Met 
15 10 15 

Leu Leu Ala Gin Val Ala Glu Gin Ala Pro Ala Cys Ala Met Gly Pro 
20 25 30 



Ala Ala Ala Ala Pro Gly Ser Pro Ser Val Pro Arg Pro Pro Pro Pro 
35 40 45 
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15 



16 



-continued 



Ala Glu Arg Pro Gly Trp Met Glu Lys Gly Glu Tyr Asp Leu Val Ser 
50 55 60 

Ala Tyr Glu Val Asp His Arg Gly Asp Tyr Val Ser His Glu lie Met 
65 70 75 80 

His His Gin Arg Arg Arg Arg Ala Val Ala Val Ser Glu Val Glu Ser 
85 90 95 

Leu His Leu Arg Leu Lys Gly Ser Arg His Asp Phe His Val Asp Leu 
100 105 HO 

Arg Thr Ser Ser Ser Leu Val Ala Pro Gly Phe He Val Gin Thr Leu 
115 120 125 

Gly Lys Thr Gly Thr Lys Ser Val Gin Thr Leu Pro Pro Glu Asp Phe 
130 135 140 

Cys Phe Tyr Gin Gly Ser Leu Arg Ser His Arg Asn Ser Ser Val Ala 
145 150 155 160 

Leu Ser Thr Cys Gin Gly Leu Ser Gly Met He Arg Thr Glu Glu Ala 
165 170 175 

Asp Tyr Phe Leu Arg Pro Leu Pro Ser His Leu Ser Trp Lys Leu Gly 
180 185 190 

Arg Ala Ala Gin Gly Ser Ser Pro Ser His Val Leu Tyr Lys Arg Ser 
195 200 205 

Thr Glu Pro His Ala Pro Gly Ala Ser Glu Val Leu Val Thr Ser Arg 
210 215 220 

Thr Trp Glu Leu Ala His Gin Pro Leu His Ser Ser Asp Leu Arg Leu 
225 230 235 240 

Gly Leu Pro Gin Lys Gin His Phe Cys Gly Arg Arg Lys Lys Tyr Met 
245 250 255 

Pro Gin Pro Pro Lys Glu Asp Leu Phe He Leu Pro Asp Glu Tyr Lys 
260 265 270 

Ser Cys Leu Arg His Lys Arg Ser Leu Leu Arg Ser His Arg Asn Glu 
275 280 285 

Glu Leu Asn Val Glu Thr Leu Val Val Val Asp Lys Lys Met Met Gin 
290 295 300 

Asn Hie Gly His Glu Asn He Thr Thr Tyr Val Leu Thr He Leu Asn 
305 310 315 320 

Met Val Ser Ala Leu Phe Lys Asp Gly Thr He Gly Gly Asn He Asn 
325 330 335 

He Ala He Val Gly Leu He Leu Leu Glu Asp Glu Gin Pro Gly Leu 
340 345 350 

Val He Ser His His Ala Asp Hie Thr Leu Ser Ser Phe Cys Gin Trp 
355 360 365 

Gin Ser Gly Leu Met Gly Lys Asp Gly Thr Arg His Asp His Ala He 
370 375 380 

Leu Leu Thr Gly Leu Asp He Cys Ser Trp Lys Asn Glu Pro Cys Asp 
385 390 395 400 

Thr Leu Gly Phe Ala Pro He Ser Gly Met Cys Ser Lys Tyr Arg Ser 
405 410 415 

Cys Thr He Asn Glu Asp Thr Gly Leu Gly Leu Ala Phe Thr He Ala 
420 425 430 

His Glu Ser Gly His Asn Phe Gly Met He His Asp Gly Glu Gly Asn 
435 440 445 

Met Cys Lys Lys Ser Glu Gly Asn He Met Ser Pro Thr Leu Ala Gly 
450 455 460 

Arg Asn Gly Val Phe Ser Trp Ser Pro Cys Ser Arg Gin Tyr Leu His 
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18 



-continued 



465 



470 



475 



480 



Lys Phe Leu Arg Ser Val Lys Met Pro Ala Leu 
485 490 



/ 



<210> SEQ ID NO 3 

<211> LENGTH: 3675 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<4 00> SEQUENCE: 3 

atgaagcccc gcgcgcgcgg atggcggggc ttggcggcgc tgtggatgct gctggcgcag 60 

gtggccgagc aggcacctgc gtgcgccatg ggacccgcag cggcagcgcc tgggagcccg 120 

agcgtcccgc gtcctcctcc acccgcggag cggccgggct ggatggaaaa gggcgaatat 180 

gacctggtct ctgcctacga ggttgaccac aggggcgatt acgtgtccca tgaaatcatg 24 0 

caccatcagc ggcggagaag agcagtggcc gtgtccgagg ttgagtctct tcaccttcgg 300 

ctgaaaggct ccaggcacga cttccacgtg gatctgagga cttccagcag cctagtggct 360 

cctggcttta ttgtgcagac gttgggaaag acaggcacta agtctgtgca gactttaccg 420 

ccagaggact tctgtttcta tcaaggctct ttgcgatcac acagaaactc ctcagtggcc 480 

ctttcaacct gccaaggctt gtcaggcatg atacgaacag aagaggcaga ttacttccta 540 

aggccacttc cttcacacct ctcatggaaa ctcggcagag ctgcccaagg cagctcgcca 600 

tcccacgtac tgtacaagag atccacagag ccccatgctc ctggggccag tgaggtcctg 660 

gtgacctcaa ggacatggga gctggcacat caacccctgc acagcagcga ccttcgcctg 720 

ggactgccac aaaagcagca tttctgtgga agacgcaaga aatacatgcc ccagcctccc 780 

aaggaagacc tcttcatctt gccagatgag tataagtctt gcttacggca taagcgctct 84 0 

cttctgaggt cccatagaaa tgaagaactg aacgtggaga ccttggtggt ggtcgacaaa 900 

aagatgatgc aaaaccatgg ccatgaaaat atcaccacct acgtgctcac gatactcaac 960 

atggtatctg ctttattcaa agatggaaca ataggaggaa acatcaacat tgcaattgta 1020 

ggtctgattc ttctagaaga tgaacagcca ggactggtga taagtcacca cgcagaccac 1080 

accttaagta gcttctgcca gtggcagtct ggattgatgg ggaaagatgg gactcgtcat 1140 

gaccacgcca tcttactgac tggtctggat atatgttcct ggaagaatga gccctgtgac 1200 

actttgggat ttgcacccat aagtggaatg tgtagtaaat atcgcagctg cacgattaat 1260 

gaagatacag gtcttggact ggccttcacc attgcccatg agtctggaca caactttggc 1320 

atgattcatg atggagaagg gaacatgtgt aaaaagtccg agggcaacat catgtcccct 1380 

acattggcag gacgcaatgg agtcttctcc tggtcaccct gcagccgcca gtatctacac 1440 

aaatttctaa gcaccgctca agctatctgc cttgctgatc agccaaagcc tgtgaaggaa 1500 

tacaagtatc ctgagaaatt gccaggagaa ttatatgatg caaacacaca gtgcaagtgg 1560 

cagttcggag agaaagccaa gctctgcatg ctggacttta aaaaggacat ctgtaaagcc 1620 

ctgtggtgcc atcgtattgg aaggaaatgt gagactaaat ttatgccagc agcagaaggc 1680 

acaatttgtg ggcatgacat gtggtgccgg ggaggacagt gtgtgaaata tggtgatgaa 1740 

ggccccaagc ccacccatgg ccactggtcg gactggtctt cttggtcccc atgctccagg 1800 

acctgcggag ggggagtatc tcataggagt cgcctctgca ccaaccccaa gccatcgcat 1860 

ggagggaagt tctgtgaggg ctccactcgc actctgaagc tctgcaacag tcagaaatgt 1920 

ccccgggaca gtgttgactt ccgtgctgct cagtgtgccg agcacaacag cagacgattc 1980 
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19 



20 



-continued 



agagggcggc 


actacaagtg 


gaagccttac actcaagtag 


aagatcagga cttatgcaaa 


2040 


ctctactgta 


tcgcagaagg 


atttgatttc ttcttttctt 


tgtcaaataa agtcaaagat 


2100 


gggactccat 


gctcggagga 


tagccgtaat gtttgtatag atgggatatg tgagagagtt 


2160 


ggatgtgaca 


atgtccttgg 


atctgatgct gttgaagacg tctgtggggt gtgtaacggg 


2220 


aataactcag 


cctgcacgat 


tcacaggggt ctctacacca 


agcaccacca caccaaccag 


2280 


tattatcaca 


tggtcaccat 


tccttctgga gcccggagta tccgcatcta tgaaatgaac 


2340 


gtctctacct 


cctacatttc 


tgtgcgcaat gccctcagaa ggtactacct gaatgggcac 


2400 


tggaccgtgg 


actggcccgg 


ccggtacaaa ttttcgggca ctactttcga ctacagacgg 


2460 


tcctotaatg 


agcccgagaa 


cttaatcgct actggaccaa 


ccaacgagac actgattgtg 


2520 


gagctgctgt 


ttcagggaag 


gaacccgggt gttgcctggg 


aatactccat gcctcgcttg 


2580 


gggaccgaga 


agcagccccc 


tgcccagccc agctacactt 


gggccatcgt gcgctctgag 


2640 


tgctccgtgt 


cctgcggagg 


gggacagatg accgtgagag 


agggctgcta cagagacctg 


2700 


aagtttcaag 


taaatatgtc 


cttctgcaat cccaagacac 


gacctgtcac ggggctggtg 


2760 


ccttgcaaag 


tatctgcctg 


tcctcccagc tggtccgtgg 


ggaactggag tgcctgcagt 


2820 


cggacgtgtg 


gcgggggtgc 


ccagagccgc cccgtgcagt 


gcacacggcg ggtgcactat 


2880 


gactcggagc 


cagtcccggc 


cagcctgtgc cctcagcctg 


ctccctccag caggcaggcc 


2940 


tgcaactctc 


agagctgccc 


acctgcatgg agcgccgggc 


cctgggcaga gtgctcacac 


3000 


acctgtggga 


aggggtggag 


gaagcgggca gtggcctgta 


agagcaccaa cccctcggcc 


3060 



agagcgcagc 


tgctgcccga 


cgctgtctgc 


acctccgagc 


ccaagcccag gatgcatgaa 


3120 


gcctgtctgc 


ttcagcgctg 


ccacaagccc 


aagaagctgc 


agtggctggt gtccgcctgg 


3180 


tcccagtgct 


ctgtgacatg 


tgaaagagga 


acacagaaaa 


gattcttaaa atgtgctgaa 


3240 


aagtatgttt 


ctggaaagta 


tcgagagctg 


gcctcaaaga 


agtgctcaca tttgccgaag 


3300 


cccagcctgg 


agctggaacg 


tgcctgcgcc 


ccgcttccat 


gccccaggca ccccccattt 


3360 


gctgctgcgg 


gaccctcgag 


gggcagctgg tttgcctcac 


cctggtctca gtgcacggcc 


3420 


agctgtgggg 


gaggcgttca 


gacgaggtcc 


gtgcagtgcc 


tggctggggg ccggccggcc 


3480 


tcaggctgcc 


tcctgcacca 


gaagccttcg 


gcctccctgg 


cctgcaacac tcacttctgc 


3540 


cccattgcag 


agaagaaaga 


tgccttctgc 


aaagactact 


tccactggtg ctacctggta 


3600 


ccccagcacg 


ggatgtgcag 


ccacaagttc 


tacggcaagc 


agtgctgcaa gacttgctct 


3660 


aagtccaact 


tgtga 








3675 



<210> SEQ ID NO 4 

<211> LENGTH: 1224 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 4 

Met Lys Pro Arg Ala Arg Gly Trp Arg Gly Leu Ala Ala Leu Trp Met 
15 10 15 

Leu Leu Ala Gin Val Ala Glu Gin Ala Pro Ala Cys Ala Met Gly Pro 

20 25 30 

Ala Ala Ala Ala Pro Gly Ser Pro Ser Val Pro Arg Pro Pro Pro Pro 
35 40 45 

Ala Glu Arg Pro Gly Trp Met Glu Lys Gly Glu Tyr Asp Leu Val Ser 
50 55 60 

Ala Tyr Glu Val Asp His Arg Gly Asp Tyr Val Ser Hie Glu He Met 
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65 



70 



75 



80 



His His Gin Arg Arg Arg Arg Ala Val Ala Val Ser Glu Val Glu Ser 
85 90 95 

Leu His Leu Arg Leu Lye Gly Ser Arg Hie Asp Phe His Val Asp Leu 
100 105 110 

Arg Thr Ser Ser Ser Leu Val Ala Pro Gly Phe lie Val Gin Thr Leu 
115 120 125 

Gly Lys Thr Gly Thr Lys Ser Val Gin Thr Leu Pro Pro Glu Asp Phe 
130 135 140 

Cys Phe Tyr Gin Gly Ser Leu Arg Ser His Arg Asn Ser Ser Val Ala 
145 150 155 160 

Leu Ser Thr Cys Gin Gly Leu Ser Gly Met lie Arg Thr Glu Glu Ala 
165 170 175 

Asp Tyr Phe Leu Arg Pro Leu Pro Ser Hie Leu Ser Trp Lys Leu Gly 
180 185 190 

Arg Ala Ala Gin Gly Ser Ser Pro Ser His Val Leu Tyr Lys Arg Ser 
195 200 205 

Thr Glu Pro His Ala Pro Gly Ala Ser Glu Val Leu Val Thr Ser Arg 
210 215 220 

Thr Trp Glu Leu Ala His Gin Pro Leu His Ser Ser Asp Leu Arg Leu 
225 230 235 240 

Gly Leu Pro Gin Lys Gin His Phe Cys Gly Arg Arg Lys Lys Tyr Met 
245 250 255 

Pro Gin Pro Pro Lys Glu Asp Leu Phe He Leu Pro Asp Glu Tyr Lys 
260 265 270 

Ser Cys Leu Arg His Lys Arg Ser Leu Leu Arg Ser His Arg Asn Glu 
275 280 285 

Glu Leu Asn Val Glu Thr Leu Val Val Val Asp Lys Lys Met Met Gin 
290 295 300 

Asn His Gly His Glu Asn He Thr Thr Tyr Val Leu Thr He Leu Asn 
305 310 315 320 

Met Val Ser Ala Leu Phe Lys Asp Gly Thr He Gly Gly Asn He Asn 
325 330 335 

He Ala He Val Gly Leu lie Leu Leu Glu Asp Glu Gin Pro Gly Leu 
340 345 350 

Val He Ser His His Ala Asp His Thr Leu Ser Ser Phe Cys Gin Trp 
355 360 365 

Gin Ser Gly Leu Met Gly Lys Asp Gly Thr Arg His Asp His Ala He 
370 375 380 

Leu Leu Thr Gly Leu Asp He Cys Ser Trp Lys Asn Glu Pro Cys Asp 
385 390 395 400 

Thr Leu Gly Phe Ala Pro He Ser Gly Met Cys Ser Lys Tyr Arg Ser 
405 410 415 

Cys Thr He Asn Glu Asp Thr Gly Leu Gly Leu Ala Phe Thr He Ala 
420 425 430 

His Glu Ser Gly His Asn Phe Gly Met He His Asp Gly Glu Gly Asn 
435 440 445 

Met Cys Lys Lys Ser Glu Gly Asn He Met Ser Pro Thr Leu Ala Gly 
450 455 460 

Arg Asn Gly Val Phe Ser Trp Ser Pro Cys Ser Arg Gin Tyr Leu His 
465 470 475 480 



Lys Phe Leu Ser Thr Ala Gin Ala He Cys Leu Ala Asp Gin Pro Lys 
485 490 495 
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Pro Val Lys Glu Tyr Lys Tyr Pro Glu Lys Leu Pro Gly Glu Leu Tyr 
500 505 510 

Asp Ala Aen Thr Gin Cys Lye Trp Gin Phe Gly Glu Lys Ala Lye Leu 
515 520 525 

Cys Met Leu Asp Phe Lys Lys Asp lie Cys Lys Ala Leu Trp Cys His 
530 535 540 

Arg He Gly Arg Lys Cys Glu Thr Lys Phe Met Pro Ala Ala Glu Gly 
545 550 555 560 

Thr He Cys Gly His Asp Met Trp Cys Arg Gly Gly Gin Cys Val Lys 
565 570 575 

Tyr Gly Asp Glu Gly Pro Lys Pro Thr His Gly His Trp Ser Asp Trp 
580 585 590 

Ser Ser Trp Ser Pro Cys Ser Arg Thr Cys Gly Gly Gly Val Ser His 
595 600 605 

Arg Ser Arg Leu Cys Thr Asn Pro Lys Pro Ser His Gly Gly Lys Phe 
610 615 620 

Cys Glu Gly Ser Thr Arg Thr Leu Lys Leu Cys Asn Ser Gin Lyo Cys 
625 630 635 640 

Pro Arg Asp Ser Val Asp Phe Arg Ala Ala Gin Cys Ala Glu His Asn 
645 650 655 

Ser Arg Arg Phe Arg Gly Arg His Tyr Lys Trp Lys Pro Tyr Thr Gin 
660 665 670 

Val Glu Asp Gin Asp Leu Cys Lys Leu Tyr Cys He Ala Glu Gly Phe 
675 680 685 

Asp Phe Phe Phe Ser Leu Ser Asn Lys Val Lys Asp Gly Thr Pro Cys 
690 695 700 

Ser Glu Asp Ser Arg Asn Val Cys He Asp Gly He Cys Glu Arg Val 
705 710 715 720 

Gly Cys Asp Asn Val Leu Gly Ser Asp Ala Val Glu Asp Val Cys Gly 
725 730 735 

Val Cys Asn Gly Asn Asn Ser Ala Cys Thr He His Arg Gly Leu Tyr 
740 745 750 

Thr Lys His His His Thr Asn Gin Tyr Tyr His Met Val Thr He Pro 
755 760 765 

Ser Gly Ala Arg Ser He Arg He Tyr Glu Met Asn Val Ser Thr Ser 
770 775 780 

Tyr He Ser Val Arg Asn Ala Leu Arg Arg Tyr Tyr Leu Asn Gly His 
785 790 795 800 

Trp Thr Val Asp Trp Pro Gly Arg Tyr Lys Phe Ser Gly Thr Thr Phe 
805 810 / 815 

Asp Tyr Arg Arg Ser Tyr Asn Glu Pro Glu Asn Leu He Ala Thr Gly 
820 825 830 

Pro Thr Asn Glu Thr Leu He Val Glu Leu Leu Phe Gin Gly Arg Asn 
835 840 845 

Pro Gly Val Ala Trp Glu Tyr Ser Met Pro Arg Leu Gly Thr Glu Lys 
850 855 860 

Gin Pro Pro Ala Gin Pro Ser Tyr Thr Trp Ala He Val Arg Ser Glu 
865 870 875 880 

Cys Ser Val Ser Cys Gly Gly Gly Gin Met Thr Val Arg Glu Gly Cys 
885 890 895 

Tyr Arg Asp Leu Lys Phe Gin Val Asn Met Ser Phe Cys Asn Pro Lys 
900 905 910 
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Thr Arg Pro Val Thr Gly Leu Val Pro Cys Lye Val Ser Ala Cys Pro 
915 920 925 

Pro Ser Trp Ser Val Gly Asn Trp Ser Ala Cys Ser Arg Thr Cys Gly 
930 935 940 

Gly Gly Ala Gin Ser Arg Pro Val Gin Cys Thr Arg Arg Val His Tyr 
945 950 955 960 

Asp Ser Glu Pro Val Pro Ala Ser Leu CyB Pro Gin Pro Ala Pro Ser 
965 970 975 

Ser Arg Gin Ala Cys Asn Ser Gin Ser Cys Pro Pro Ala Trp Ser Ala 
980 985 990 

Gly Pro Trp Ala Glu Cys Ser His Thr Cys Gly Lys Gly Trp Arg Lys 
995 1000 1005 

Arg Ala Val Ala Cys Lys Ser Thr Asn Pro Ser Ala Arg Ala Gin Leu 
1010 1015 1020 

Leu Pro Asp Ala Val Cys Thr Ser Glu Pro Lys Pro Arg Met His Glu 
1025 1030 1035 1040 

Ala Cys Leu Leu Gin Arg Cys His Lys Pro Lys Lys Leu Gin Trp Leu 
1045 1050 1055 

Val Ser Ala Trp Ser Gin Cys Ser Val Thr Cys Glu Arg Gly Thr Gin 
1060 1065 1070 

Lys Arg Phe Leu Lys Cys Ala Glu Lys Tyr Val Ser Gly Lys Tyr Arg 
1075 1080 1085 

Glu Leu Ala Ser Lys Lys Cys Ser His Leu Pro Lys Pro Ser Leu Glu 
1090 1095 1100 

Leu Glu Arg Ala Cys Ala Pro Leu Pro Cys Pro Arg His Pro Pro Phe 
1105 1110 1115 1120 

Ala Ala Ala Gly Pro Ser Arg Gly Ser Trp Phe Ala Ser Pro Trp Ser 
1125 1130 1135 

Gin Cys Thr Ala Ser Cys Gly Gly Gly Val Gin Thr Arg Ser Val Gin 
1140 1145 1150 

Cys Leu Ala Gly Gly Arg Pro Ala Ser Gly Cys Leu Leu His Gin Lys 
1155 1160 1165 

Pro Ser Ala Ser Leu Ala Cys Asn Thr His Phe Cys Pro He Ala Glu 
1170 1175 1180 

Lys Lys Asp Ala Phe Cys Lys Asp Tyr Phe His Trp Cys Tyr Leu Val 
1185 1190 1195 1200 

Pro Gin His Gly Met Cys Ser His Lys Phe Tyr Gly Lys Gin Cys Cys 
1205 1210 1215 

Lys Thr Cys Ser Lys Ser Asn Leu 
1220 



<210> SEQ ID NO 5 

<211> LENGTH: 4042 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<4 00> SEQUENCE: 5 

ccttcccgcg ctctgcttgg gtcgggtcct ccctgcccgc tcgcacgctg ccggccgggg 60 

accctccggt ggcccctagc ccctcggagc gctcctggat gaagccccgc gcgcgcggat 120 

ggcggggctt ggcggcgctg tggatgctgc tggcgcaggt ggccgagcag gcacctgcgt 180 

gcgccatggg acccgcagcg gcagcgcctg ggagcccgag cgtcccgcgt cctcctccac 240 

ccgcggagcg gccgggctgg atggaaaagg gcgaatatga cctggtctct gcctacgagg 300 

ttgaccacag gggcgattac gtgtcccatg aaatcatgca ccatcagcgg cggagaagag 360 
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cagtggccgt gtccgaggtt 


gagtctcttc 


accttcggct 


gaaaggctcc 


aggcacgact 


420 


tccacgtgga tctgaggact 


tccagcagcc 


tagtggctcc 


tggctttatt 


gtgcagacgt 


480 


tgggaaagac aggcactaag 


tctgtgcaga 


ctttaccgcc 


agaggacttc 


tgtttctatc 


540 


aaggctcttt gcgatcacac 


agaaactcct 


cagtggccct 


ttcaacctgc 


caaggcttgt 


600 


caggcatgat acgaacagaa 


gaggcagatt 


acttcctaag 


gccacttcct 


tcacacctct 


660 


catggaaact cggcagagct 


gcccaaggca 


gctcgccatc 


ccacgtactg 


tacaagagat 


720 


ccacagagcc ccatgctcct 


ggggccagtg 


aggtcctggt 


gacctcaagg 


acatgggagc 


780 


tggcacatca acccctgcac 


agcagcgacc 


ttcgcctggg 


actgccacaa 


aagcagcatt 


840 


tctgtggaag acgcaagaaa 


tacatgcccc 


agcctcccaa 


ggaagacctc 


ttcatcttgc 


900 


cagatgagta taagtcttgc 


ttacggcata 


agcgctctct 


tctgaggtcc 


catagaaatg 


960 


aagaactgaa cgtggagacc 


ttggtggtgg 


tcgacaaaaa 


gatgatgcaa 


aaccatggcc 


1020 


atgaaaatat caccacctac 


gtgctcacga 


tactcaacat 


ggtatctgct 


ttattcaaag 


1080 


atggaacaat aggaggaaac 


atcaacattg 


caattgtagg 


tctgattctt 


ctagaagatg 


1140 


aacagccagg actggtgata 


agtcaccacg 


cagaccacac 


cttaagtagc 


ttctgccagt 


1200 


ggcagtctgg attgatgggg 


aaagatggga 


ctcgtcatga 


ccacgccatc 


ttactgactg 


1260 


gtctggatat atgttcctgg 


aagaatgagc 


cctgtgacac 


tttgggattt 


gcacccataa 


1320 


gtggaatgtg tagtaaatat 


cgcagctgca 


cgattaatga 


agatacaggt 


cttggactgg 


1380 


ccttcaccat tgcccatgag 


tctggacaca 


actttggcat 


gat teat gat 


ggagaaggga 


1440 


acatgtgtaa aaagtccgag 


ggcaacatca 


tgtcccctac 


attggcagga 


cgcaatggag 


1500 


tcttctcctg gtcaccctgc 


agccgccagt 


atctacacaa 


atttctaagc 


accgctcaag 


1560 


ctatctgcct tgctgatcag 


ccaaagcctg 


tgaaggaata 


caagtatcct 


gagaaattgc 


1620 


caggagaatt atatgatgca 


aacacacagt 


gcaagtggca 


gtteggagag 


aaagecaage 


1680 


tctgcatgct ggactttaaa 


aaggacatct 


gtaaagccct 


gtggtgccat 


cgtattggaa 


1740 


ggaaatgtga gactaaattt 


atgccagcag 


cagaaggcac 


aatttgtggg 


catgacatgt 


1800 


ggtgccgggg aggacagtgt 


gtgaaatatg 


gtgatgaagg 


ccccaagccc 


acccatggcc 


1860 


actggtcgga ctggtcttct 


tggtccccat 


gctccaggac 


ctgcggaggg 


ggagtatctc 


1920 


ataggagtcg cctctgcacc 


aaccccaagc 


catcgcatgg 


agggaagttc 


tgtgagggct 


1980 


ccactcgcac tctgaagctc 


tgcaacagtc 


agaaatgtcc 


ccgggacagt 


gttgacttcc 


2040 


gtgctgctca gtgtgccgag 


cacaacagca 


gacgattcag 


agggeggcac 


tacaagtgga 


2100 


agccttacac tcaagtagaa 


gatcaggact 


tatgcaaact 


ctactgtatc 


gcagaaggat 


2160 


ttgatttctt cttttctttg 


tcaaataaag 


tcaaagatgg 


gactccatgc 


teggaggata 


2220 


gccgtaatgt ttgtatagat 


gggatatgtg 


agagagttgg 


atgtgacaat 


gtccttggat 


2280 


ctgatgctgt tgaagacgtc 


tgtggggtgt 


gtaacgggaa 


taactcagcc 


tgeacgatte 


2340 


acaggggtct ctacaccaag 


caccaccaca 


ccaaccagta 


ttatcacatg 


gtcaccattc 


2400 


cttctggagc ccggagtatc 


cgcatctatg 


aaatgaacgt 


ctctacctcc 


tacatttctg 


2460 


tgcgcaatgc cctcagaagg tactacctga 


atgggcactg 


gaccgtggac 


tggcccggcc 


2520 


ggtacaaatt ttcgggcact 


actttcgact 


acagacggtc 


ctataatgag 


cccgagaact 


2580 


taatcgctac tggaccaacc 


aacgagacac 


tgattgtgga 


gctgctgttt 


cagggaagga 


2640 


acccgggtgt tgcctgggaa tactccatgc 


ctcgcttggg 


gaccgagaag 


cagccccctg 


2700 




# 
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cccagcccag ctacacttgg gccatcgtgc 


gctctgagtg ctccgtgtcc 




2760 






gagacctgaa 


gttfccaagta 


aatatgtcct 


2820 




caagacacga cctg'tcacgg 


ggctggtgcc 


ttgcaaagta 


tctgcctgtc 


2880 


ctcccsigctg 


gtccgtgggg aactggag'tg 


cct gcagtcg 


y oi*yi cy vy y u 


aaoaataccc 


2940 


€ig£igCCgCCC 




"t gcactatga 


ctcggagcca 


gtcccggcca 


3000 


gcctgtgccc 




ggcaggcctg 


caact ctcag 


agctgcccac 


3060 


c^.gc ot gg&g 


cgccgggccc tgggcagag^ 


gctcacacac 


ctgtgggaag 


3 33 ^3 3"3 3 " 


3120 


q gc g g g c ci gt 


g gc C'tig'taag agcaccaacc 


cct cggccag 


agcgcagcbg 


ctgcccgacg 


3180 


"Hrr^ /■*+■ rtr* Af+ 


ctccgagccc aagcccagga 


tgcat gaagc 


ctgtctgctt 


cagcgctgcc 


3240 


aCa&gccc&a 


gaagctgcag tggctggtgt 


ccgcctggtc 


ccagtgctct 


gtgacatgtg 


3300 


a Qu^j c*y y a at* 


acagaaaaga ttcttaaaat 


gtgctgaaaa 


gtatgtttct 


ggaaagt a~tc 


3360 


gagagctggc 


ctcaaagaag tgctcacatt 


tgccgaagcc 


cagcctggag 


ctggaac gtg 


3420 


cctgcgcccc 


gcttccatgc cccaggcacc 


ccccatttgc 


tgctgcggga 


ccctcgaggg 


3480 


gcagctggtt 


tgcctcaccc tggtctcagt 


gcacggccag 


ctgtggggga 


g gc gt t c ag a 


3540 


cgaggtccgt 


gcagtgcctg gctgggggcc 


ggccggcctc 


aggctgcctc 


ctgcaccaga 


3600 


agccttcggc 


ctccctggcc tgcaacactc 


acttctgccc 


cattgcagag 


aagaaagatg 


3660 


ccttctgcaa 


agactacttc cactggtgct 


acctggtacc 


ccagcacggg 


atgtgcagcc 


3720 


acaagttcta 


cggcaagcag tgctgcaaga 


cttgctctaa 


gtccaacttg 


tgagttggga 


3780 


ccgctctccg 


tagcagagaa agtgcctgcg tggcacagaa atttcccaca 


aatgagctgt 


3840 


gcaatctacg tcggaataca tccaaggaag agcaaagcca 


aaagaagaaa 


accgt-gttag 


3900 


gctctttgac 


caggagtgta tgtatgtgtt 


tcactgtgag 


cctgggtgca 


gacctgtgtc 


3960 


cccatgcaca 


cagtgtctcc tgtcaggctg 


aaatgtggca ccctggcaga 


cagagctgtg 


4020 


gctcgtgagg 


cagaaggcag gc 








4042 



What is claimed is: 

1. An isolated nucleic acid molecule comprising a nucle- 
otide sequence encoding an amino acid sequence drawn 
from the group consisting of SEQ ID NOS: 2 and 4. 

2. An isolated nucleic acid molecule comprising a nucle- 
otide sequence that: 

(a) encodes the amino acid sequence shown in SEQ ID 
NO: 4; and 

(b) hybridizes under stringent conditions of hybridization 
to filter-bound DNAin 0.5 M NaHP0 4 , 7% SDS, ImM 
EDT at 65° C. and washing in O.lxSSC/1% SDS at 68° 



C. to the nucleotide sequence of SEQ ID NO: 3 or the 
complement thereof. 

3. An isolated nucleic acid molecule according to claim 1 
wherein said nucleotide sequence is present in cDNA. 

4. An isolated nucleic acid molecule encoding the amino 
acid sequence presented in SEQ ID NO:4. 

5. An isolated nucleic acid molecule encoding the amino 
acid sequence presented in SEQ ID NO: 2. 

***** 
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HUMAN G-COUPLED PROTEIN RECEPTOR 
KINASES AND POLYNUCLEOTIDES 
ENCODING THE SAME 

The present application claims the benefit of U.S. Pro- 
visional Application No. 60/188,449 which was filed on 
Mar. 10, 2000 and is herein incorporated by reference in its 
entirety. 

1. INTRODUCTION 

The present invention relates to the discovery, 
identification, and characterization of novel human poly- 
nucleotides encoding proteins that share sequence similarity 
with animal kinases. The invention encompasses the 
described polynucleotides, host cell expression systems, the 
encoded proteins, fusion proteins, polypeptides and 
peptides, antibodies to the encoded proteins and peptides, 
and genetically engineered animals that either lack or over 
express the disclosed polynucleotides, antagonists and ago- 
nists of the proteins, and other compounds that modulate the 
expression or activity of the proteins encoded by the dis- 
closed polynucleotides that can be used for diagnosis, drug 
screening, clinical trial monitoring and the treatment of 
diseases and physiological disorders. 

2. BACKGROUND OF THE INVENTION 

Kinases mediate phosphorylation of a wide variety of 
proteins and compounds in the cell. In conjunction with 
phosphatases, kinases are involved in a range of regulatory 
and signaling pathways. Given the physiological importance 
of kinases, they have been subject to intense scrutiny and are 
proven drug targets. 

3. SUMMARY OF THE INVENTION 

The present invention relates to the discovery, 
identification, and characterization of nucleotides that 
encode novel human proteins and the corresponding amino 
acid sequences of these proteins. The novel human proteins 
(NHPs) described for the first time herein share structural 
similarity with animal kinases, including, but not limited to 
G-protein coupled receptor kinases (GRKs). As such, the 
novel polynucleotides encode novel GRKs having homo- 
logues and orthologs across a range of phyla and species. 

The novel human polynucleotides described herein, 
encode open reading frames (ORFs) encoding proteins of 
553 and 353 amino acids in length (see SEQ ID NOS:2 and 
4 respectively). 

The invention also encompasses agonists and antagonists 
of the described NHPs, including small molecules, large 
molecules, mutant NHPs, or portions thereof, that compete 
with native NHP, peptides, and antibodies, as well as nucle- 
otide sequences that can be used to inhibit the expression of 
the described NHPs (e.g., antisense and ribozyme 
molecules, and gene or regulatory sequence replacement 
constructs) or to enhance the expression of the described 
NHP polynucleotides (e.g., expression constructs that place 
the described polynucleotide under the control of a strong 
promoter system), and transgenic animals that express a 
NHP transgene, or "knock-outs" (which can be conditional) 
that do not express a functional NHP. Knock-out mice can 
be produced in several ways, one of which involves the use 
of mouse embryonic stem cells ("ES cells") lines that 
contain gene trap mutations in a murine homolog of at least 
one of the described NHPs. When the unique NHP 
sequences described in SEQ ID NOS:l-5 are "knocked-out" 
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they provide a method of identifying phenotypic expression 
of the particular gene as well as a method of assigning 
function to previously unknown genes. Additionally, the 
unique NHP sequences described in SEQ ID NOS:l-5 are 

5 useful for the identification of coding sequence and the 
mapping a unique gene to a particular chromosome. 

Further, the present invention also relates to processes for 
identifying compounds that modulate, i.e., act as agonists or 
antagonists, of NHP expression and/or NHP activity that 

30 utilize purified preparations of the described NHPs and/or 
NHP product, or cells expressing the same. Such compounds 
can be used as therapeutic agents for the treatment of any of 
a wide variety of symptoms associated with biological 
disorders or imbalances 

15 

4. DESCRIPTION OF THE SEQUENCE LISTING 
AND FIGURES 

The Sequence Listing provides the sequence of the novel 
human ORFs encoding the described novel human kinase 
proteins. SEQ ID NO:5 describes a full length ORF and 
flanking regions. 

5. DETAILED DESCRIPTION OF THE 
INVENTION 

25 

The NHPs, described for the first time herein, are novel 
proteins that are expressed in, inter alia, human cell lines, 
and human fetal brain, adult brain, pituitary, cerebellum, 
spinal cord, thymus, kidney, fetal liver, prostate, testis, 

30 adrenal gland, small intestine, skeletal muscle, uterus, 
placenta, mammary gland, and pericardium cells. The 
described sequences were compiled from gene trapped 
sequences in conjunction with sequences available in 
GENBANK, and cDNAs from adrenal gland, skeletal 

35 muscle, thymus, and testis libraries (Edge Biosystems, 
Gaithersburg, Md.). 

The present invention encompasses the nucleotides pre- 
sented in the Sequence Listing, host cells expressing such 
nucleotides, the expression products of such nucleotides, 

40 and: (a) nucleotides that encode mammalian homologs of 
the described polynucleotides, including the specifically 
described NHPs, and the NHP products; (b) nucleotides that 
encode one or more portions of an NHP that correspond to 
functional domains, and the polypeptide products specified 

45 by such nucleotide sequences, including but not limited to 
the novel regions of any active domain(s); (c) isolated 
nucleotides that encode mutant versions, engineered or 
naturally occurring, of the described NHPs in which all or a 
part of at least one domain is deleted or altered, and the 

50 polypeptide products specified by such nucleotide 
sequences, including but not limited to soluble proteins and 
peptides in which all or a portion of the signal sequence is 
deleted; (d) nucleotides that encode chimeric fusion proteins 
containing all or a portion of a coding region of a NHP, or 

55 one of its domains (e.g., a receptor/ligand binding domain, 
accessory protein/self-association domain, etc.) fused to 
another peptide or polypeptide; or (e) therapeutic or diag- 
nostic derivatives of the described polynucleotides such as 
oligonucleotides, antisense polynucleotides, ribozymes, 

60 dsRNA, or gene therapy constructs comprising a sequence 
first disclosed in the Sequence Listing. As discussed above, 
the present invention includes: (a) the human DNA 
sequences presented in the Sequence Listing (and vectors 
comprising the same) and additionally contemplates any 

65 nucleotide sequence encoding a contiguous NHP open read- 
ing frame (ORF) that hybridizes to a complement of a DNA 
sequence presented in the Sequence Listing under highly 
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stringent conditions, e.g., hybridization to filter-bound DNA 
in 0.5 M NaHP0 4 , 7% sodium dodecyl sulfate (SDS), 1 mM 
EDTAat 65° C, and washing in O.lx SSC/0.1% SDS at 68° 
C. (Ausubel F. M et al., eds., 1989, Current Protocols in 
Molecular Biology, Vol. I, Green Publishing Associates, 
Inc., and John Wiley & sons, Inc., New York, at p. 2.10.3) 
and encodes a functionally equivalent gene product. Addi- 
tionally contemplated are any nucleotide sequences that 
hybridize to the complement of the DNA sequence that 
encode and express an amino acid sequence presented in the 
Sequence Listing under moderately stringent conditions, 
e.g., washing in 0.2x SSC/0.1% SDS at 42° C. (Ausubel et 
al., 1989, supra), yet still encode a functionally equivalent 
NHP product. Functional equivalents of a NHP include 
naturally occurring NHPs present in other species and 
mutant NHPs whether naturally occurring or engineered (by 
site directed mutagenesis, gene shuffling, directed evolution 
as described in, for example, U.S. Pat. No. 5,837,458). The 
invention also includes degenerate nucleic acid variants of 
the disclosed NHP polynucleotide sequences. 

Additionally contemplated are polynucleotides encoding 
NHP ORFs, or their functional equivalents, encoded by 
polynucleotide sequences that are about 99, 95, 90, or about 
85 percent similar to corresponding regions of SEQ ID NO: 1 
(as measured by BLAST sequence comparison analysis 
using, for example, the GCG sequence analysis package 
using default parameters). 

The invention also includes nucleic acid molecules, pref- 
erably DNA molecules, that hybridize to, and are therefore 
the complements of, the described NHP encoding polynucle- 
otides. Such hybridization conditions can be highly stringent 
or less highly stringent, as described above. In instances 
where the nucleic acid molecules are deoxyoligonucleotides 
("DNA oligos"), such molecules are generally about 16 to 
about 100 bases long, or about 20 to about 80, or about 34 
to about 45 bases long, or any variation or combination of 
sizes represented therein that incorporate a contiguous 
region of sequence first disclosed in the Sequence Listing. 
Such oligonucleotides can be used in conjunction with the 
polymerase chain reaction (PCR) to screen libraries, isolate 
clones, and prepare cloning and sequencing templates, etc. 

Alternatively, such NHP oligonucleotides can be used as 
hybridization probes for screening libraries, and assessing 
gene expression patterns (particularly using a micro array or 
high-throughput "chip" format). Additionally, a series of the 
described NHP oligonucleotide sequences, or the comple- 
ments thereof, can be used to represent all or a portion of the 
described NHP sequences. An oligonucleotide or polynucle- 
otide sequence first disclosed in at least a portion of one or 
more of the sequences of SEQ ID NOS: 1-5 can be used as 
a hybridization probe in conjunction with a solid support 
matrix/substrate (resins, beads, membranes, plastics, 
polymers, metal or metallized substrates, crystalline or poly- 
crystalline substrates, etc.). Of particular note are spatially 
addressable arrays (i.e., gene chips, microliter plates, etc.) of 
oligonucleotides and polynucleotides, or corresponding oli- 
gopeptides and polypeptides, wherein at least one of the 
biopolymers present on the spatially addressable array com- 
prises an oligonucleotide or polynucleotide sequence first 
disclosed in at least one of the sequences of SEQ ID NOS: 
1-5, or an amino acid sequence encoded thereby. Methods 
for attaching biopolymers to, or synthesizing biopolymers 
on, solid support matrices, and conducting binding studies 
thereon are disclosed in, inter alia, U.S. Pat. Nos. 5,700,637, 
5,556,752, 5,744,305, 4,631,211, 5,445,934, 5,252,743, 
4,713,326, 5,424,186, and 4,689,405 the disclosures of 
which are herein incorporated by reference in their entirety. 
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Addressable arrays comprising sequences first disclosed 
in SEQ ID NOS: 1-5 can be used to identify and characterize 
the temporal and tissue specific expression of a gene. These 
addressable arrays incorporate oligonucleotide sequences of 

5 sufficient length to confer the required specificity, yet be 
within the limitations of the production technology. The 
length of these probes is within a range of between about 8 
to about 2000 nucleotides. Preferably the probes consist of 
60 nucleotides and more preferably 25 nucleotides from the 

10 sequences first disclosed in SEQ ID NOS: 1-5. 

For example, a series of the described oligonucleotide 
sequences, or the complements thereof, can be used in chip 
format to represent all or a portion of the described 
sequences. The oligonucleotides, typically between about 16 

15 to about 40 (or any whole number within the stated range) 
nucleotides in length can partially overlap each other and/or 
the sequence may be represented using oligonucleotides that 
do not overlap. Accordingly, the described polynucleotide 
sequences shall typically comprise at least about two or 

20 three distinct oligonucleotide sequences of at least about 8 
nucleotides in length that are each first disclosed in the 
described Sequence Listing. Such oligonucleotide 
sequences can begin at any nucleotide present within a 
sequence in the Sequence Listing and proceed in either a 

25 sense (5-to-3') orientation vis-a-vis the described sequence 
or in an antisense orientation. 

Microarray-based analysis allows the discovery of broad 
patterns of genetic activity, providing new understanding of 
gene functions and generating novel and unexpected insight 

30 into transcriptional processes and biological mechanisms. 
The use of addressable arrays comprising sequences first 
disclosed in SEQ ID NOS: 1-5 provides detailed information 
about transcriptional changes involved in a specific pathway, 
potentially leading to the identification of novel components 

35 or gene functions that manifest themselves as novel pheno- 
types. 

Probes consisting of sequences first disclosed in SEQ ID 
NOS: 1-5 can also be used in the identification, selection and 

4Q validation of novel molecular targets for drug discovery. The 
use of these unique sequences permits the direct confirma- 
tion of drug targets and recognition of drug dependent 
changes in gene expression that are modulated through 
pathways distinct from the drugs intended target. These 

45 unique sequences therefore also have utility in defining and 
monitoring both drug action and toxicity. 

As an example of utility, the sequences first disclosed in 
SEQ ID NOS: 1-5 can be utilized in microarrays or other 
assay formats, to screen collections of genetic material from 

50 patients who have a particular medical condition. These 
investigations can also be carried out using the sequences 
first disclosed in SEQ ID NOS: 1-5 in silico and by com- 
paring previously collected genetic databases and the dis- 
closed sequences using computer software known to those in 

55 theart - 

Thus the sequences first disclosed in SEQ ID NOS: 1-5 
can be used to identify mutations associated with a particular 
disease and also as a diagnostic or prognostic assay. 
Although the presently described sequences have been 

60 specifically described using nucleotide sequence, it should 
be appreciated that each of the sequences can uniquely be 
described using any of a wide variety of additional structural 
attributes, or combinations thereof. For example, a given 
sequence can be described by the net composition of the 

65 nucleotides present within a given region of the sequence in 
conjunction with the presence of one or more specific 
oligonucleotide sequence(s) first disclosed in the SEQ ID 
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NOS: 1-5. Alternatively, a restriction map specifying the 
relative positions of restriction endomiclease digestion sites, 
or various palindromic or other specific oligonucleotide 
sequences can be used to structurally describe a given 
sequence. Such restriction maps, which are typically gener- 5 
ated by widely available computer programs (e.g., the Uni- 
versity of Wisconsin GCG sequence analysis package, 
SEQUENCHER 3.0, Gene Codes Corp., Ann Arbor, Mich., 
etc.), can optionally be used in conjunction with one or more 
discrete nucleotide sequence(s) present in the sequence that 10 
can be described by the relative position of the sequence 
relatve to one or more additional sequence(s) or one or more 
restriction sites present in the disclosed sequence. 

For oligonucleotide probes, highly stringent conditions 
may refer, e.g., to washing in 6x SSC/0.05% sodium pyro- 15 
phosphate at 37° C. (for 14-base oligos), 48° C. (for 17-base 
oligos)., 55° C. (for 20-base oligos), and 60° C. (for 23-base 
oligos). These nucleic acid molecules may encode or act as 
NHP gene antisense molecules, useful, for example, in NHP 
gene regulation (for and/or as antisense primers in amplifi- 20 
cation reactions of NHP gene nucleic acid sequences). With 
respect to NHP gene regulation, such techniques can be used 
to regulate biological functions. Further, such sequences can 
be used as part of ribozyme and/or triple helix sequences that 
are also useful for NHP gene regulation, or as NHP regu- 25 
lating aptamers. 

Inhibitory antisense or double stranded oligonucleotides 
can additionally comprise at least one modified base moiety 
which is selected from the group including but not limited to 
5-fluorouracil, 5-bromouracil, 5-chlorouracil, 5-iodouracil, 30 
hypoxanthine, xantine, 4-acetylcy tosine, 
5-(carboxyhydroxylmethyl) uracil, 
5-carboxymethylaminomethyl-2-thiouridine, 
5-carboxymethylaminomethyluracil, dihydrouracil, beta-D- 
galactosylqueosine, inosine, N6-isopentenyladenine, 35 

1- methylguanine, 1-methylinosine, 2,2-dimethylguanine, 

2- methyladenine, 2-methylguanine, 3-methylcytosine, 
5-methylcytosine, N6-adenine, 7-methylguanine, 
5-methylaminomethyluracil, 5-methoxyaminomethyl-2- 
thiouracil, beta -D -mannosylqueosine , 40 
S'-methoxycarboxymethyluracil, 5-methoxyuracil, 
2-methylthio -N6-isopentenyladenine, uracil-5-oxy acetic 
acid (v), wybutoxosine, pseudouracil, queosine, 
2-thiocytosine, 5-methyl-2-thiouracil, 2-thiouracil, 
4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid 45 
methylester, uracil-5-oxyacetic acid (v), 5-methyl-2- 
thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3) 

w, and 2,6-diaminopurine. 

The antisense oligonucleotide can also comprise at least 5Q 
one modified sugar moiety selected from the group includ- 
ing but not limited to arabinose, 2-fluoroarabinose, xylulose, 
and hexose. 

In yet another embodiment, the antisense oligonucleotide 
will comprise at least one modified phosphate backbone 55 
selected from the group consisting of a phosphorothioate, a 
phosphorodithioate, a phosphoramidothioate, a 
phosphoramid ate, a phosphord iamidate , a 
methylphosphonate, an alkyl phosphotriester, and a formac- 
etal or analog thereof. 60 

In yet another embodiment, the antisense oligonucleotide 
is an a-anomeric oligonucleotide. An a-anomeric oligo- 
nucleotide forms specific double-stranded hybrids with 
complementary RNA in which, contrary to the usual P-units, 
the strands run parallel to each other (Gautier et al., 1987, 65 
Nucl. Acids Res. 15:6625-6641). The oligonucleotide is a 
2'-0-methylribonucleotide (Inoue et al., 1987, Nucl. Acids 



Res. 15:6131-6148), or a chimeric RNA-DNA analogue 
(Inoue et al., 1987, FEBS Lett. 215:327-330). Alternatively, 
double stranded RNA can be used to disrupt the expression 
and function of a targeted NHP. 

Oligonucleotides of the invention can be synthesized by 
standard methods known in the art, e.g. by use of an 
automated DNA synthesizer (such as are commercially 
available from Biosearch, Applied Biosystems, etc.). As 
examples, phosphorothioate oligonucleotides can be synthe- 
sized by the method of Stein et al, (1988, Nucl. Acids Res. 
16:3209), and methylphosphonate oligonucleotides can be 
prepared by use of controlled pore glass polymer supports 
(Sarin et al., 1988, Proc. Natl. Acad. Sci. U.S.A. 
85:7448-7451), etc. 

Low stringency conditions are well known to those of 
skill in the art, and will vary predictably depending on the 
specific organisms from which the library and the labeled 
sequences are derived. For guidance regarding such condi- 
tions see, for example, Sambrook et al., 1989, Molecular 
Cloning, A Laboratory Manual (and periodic updates 
thereof), Cold Springs Harbor Press, N.Y.; and Ausubel et 
al., 1989, Current Protocols in Molecular Biology, Green 
Publishing Associates and Wiley Interscience, N.Y. 

Alternatively, suitably labeled NHP nucleotide probes can 
be used to screen a human genomic library using appropri- 
ately stringent conditions or by PCR. The identification and 
characterization of human genomic clones is helpful for 
identifying polymorphisms (including, but not limited to, 
nucleotide repeats, microsatellite alleles, single nucleotide 
polymorphisms, or coding single nucleotide 
polymorphisms), determining the genomic structure of a 
given locus/allele, and designing diagnostic tests. For 
example, sequences derived from regions adjacent to the 
intron/exon boundaries of the human gene can be used to 
design primers for use in amplification assays to detect 
mutations within the exons, introns, splice sites (e.g., splice 
acceptor and/or donor sites), etc., that can be used in 
diagnostics and pharmacogenomics. 

Further, a NHP gene homolog can be isolated from 
nucleic acid from an organism of interest by performing 
PCR using two degenerate or "wobble" oligonucleotide 
primer pools designed on the basis of amino acid sequences 
within the NHP products disclosed herein. The template for 
the reaction may be total RNA, mRNA, and/or cDNA 
obtained by reverse transcription of mRNA prepared from, 
for example, human or non-human cell lines or tissue known 
or suspected to express an allele of a NHP gene. 

The PCR product can be subcloned and sequenced to 
ensure that the amplified sequences represent the sequence 
of the desired NHP gene. The PCR fragment can then be 
used to isolate a full length cDNA clone by a variety of 
methods. For example, the amplified fragment can be 
labeled and used to screen a cDNA library, such as a 
bacteriophage cDNA library. Alternatively, the labeled frag- 
ment can be used to isolate genomic clones via the screening 
of a genomic library. 

PCR technology can also be used to isolate full length 
cDNA sequences. For example, RNA can be isolated, fol- 
lowing standard procedures, from an appropriate cellular or 
tissue source (i.e., one known, or suspected, to express a 
NHP gene). A reverse transcription (RT) reaction can be 
performed on the RNA using an oligonucleotide primer 
specific for the most 5' end of the amplified fragment for the 
priming of first strand synthesis. The resulting RNA/DNA 
hybrid may then be "tailed" using a standard terminal 
transferase reaction, the hybrid may be digested with RNase 
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H, and second strand synthesis may then be primed with a 
complementary primer. Thus, cDNA sequences upstream of 
the amplified fragment can be isolated. For a review of 
cloning strategies that can be used, see e.g., Sambrook et al., 
1989, supra. 5 

A cDNA encoding a mutant NHPgene can be isolated, for 
example, by using PCR. In this case, the first cDNA strand 
may be synthesized by hybridizing an oligo-dT oligonucle- 
otide to mRNA isolated from tissue known or suspected to 
be expressed in an individual putatively carrying a mutant 10 
NHP allele, and by extending the new strand with reverse 
transcriptase. The second strand of the cDNA is then syn- 
thesized using an oligonucleotide that hybridizes specifi- 
cally to the 5' end of the normal gene. Using these two 
primers, the product is then amplified via PCR, optionally 15 
cloned into a suitable vector, and subjected to DNA 
sequence analysis through methods well known to those of 
skill in the art. By comparing the DNA sequence of the 
mutant NHP allele to that of a corresponding normal NHP 
allele, the mutation(s) responsible for the loss or alteration 20 
of function of the mutant NHP gene product can be ascer- 
tained. 

Alternatively, a genomic library can be constructed using 
DNA obtained from an individual suspected of or known to 
carry a mutant NHP allele (e.g., a person manifesting a 25 
NHP-associated phenotype such as, for example, immune 
disorders, obesity, high blood pressure, etc.), or a cDNA 
library can be constructed using RNA from a tissue known, 
or suspected, to express a mutant NHP allele. A normal NHP 
gene, or any suitable fragment thereof, can then be labeled 30 
and used as a probe to identify the corresponding mutant 
NHP allele in such libraries. Clones containing mutant NHP 
gene sequences can then be purified and subjected to 
sequence analysis according to methods well known to those 
skilled in the art. 35 

Additionally, an expression library can be constructed 
utilizing cDNA synthesized from, for example, RNA iso- 
lated from a tissue known, or suspected, to express a mutant 
NHP allele in an individual suspected of or known to carry 4Q 
such a mutant allele. In this manner, gene products made by 
the putatively mutant tissue may be expressed and screened 
using standard antibody screening techniques in conjunction 
with antibodies raised against a normal NHP product, as 
described below. (For screening techniques, see, for 45 
example, Harlow, E. and Lane, eds., 1988, "Antibodies: A 
Laboratory Manual", Cold Spring Harbor Press, Cold Spring 
Harbor, N.Y.) 

Additionally, screening can be accomplished by screening 
with labeled NHP fusion proteins, such as, for example, 50 
alkaline phosphatase-NHP or NHP-alkaline phosphatase 
fusion proteins. In cases where a NHP mutation results in an 
expressed gene product with altered function (e.g., as a 
result of a missense or a frameshift mutation), polyclonal 
antibodies to a NHP are likely to cross-react with a corre- 55 
sponding mutant NHPgene product, library clones detected 
via their reaction with such labeled antibodies can be 
purified and subjected to sequence analysis according to 
methods well known in the art. 

An additional application of the described novel human 60 
polynucleotide sequences is their use in the molecular 
mutagenesis/evolution of proteins that are at least partially 
encoded by the described novel sequences using, for 
example, polynucleotide shuffling or related methodologies. 
Such approaches are described in U.S. Pat. Nos. 5,830,721 65 
and 5,837,458 which are herein incorporated by reference in 
their entirety. 
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The invention also encompasses (a) DNA vectors that 
contain any of the foregoing NHP coding sequences and/or 
their complements (i.e., antisense); (b) DNA expression 
vectors that contain any of the foregoing NHP coding 
sequences operatively associated with a regulatory element 
that directs the expression of the coding sequences (for 
example, baculo virus as described in U.S. Pat. No. 5,869, 
336 herein incorporated by reference); (c) genetically engi- 
neered host cells that contain any of the foregoing NHP 
coding sequences operatively associated with a regulatory 
element that directs the expression of the coding sequences 
in the host cell; and (d) genetically engineered host cells that 
express an endogenous NHP gene under the control of an 
exogenous ly introduced regulatory element (i.e., gene 
activation). As used herein, regulatory elements include, but 
are not limited to, inducible and non-inducible promoters, 
enhancers, operators and other elements known to those 
skilled in the art that drive and regulate expression. Such 
regulatory elements include but are not limited to the 
cytomegalovirus (hCMV) immediate early gene, 
regulatable, viral elements (particularly retroviral LTR 
promoters), the early or late promoters of S V40 adenovirus, 
the lac system, the trp system, the TAC system, the TRC 
system, the major operator and promoter regions of phage 
lambda, the control regions of fd coat protein, the promoter 
for 3-phosphoglycerate kinase (PGK), the promoters of acid 
phosphatase, and the promoters of the yeast a-mating fac- 
tors. 

Where, as in the present instance, some of the described 
NHP peptides or polypeptides are thought to be cytoplasmic 
proteins, expression systems can be engineered that produce 
soluble derivatives of a NHP (corresponding to a NHP 
extracellular and/or intracellular domains, or truncated 
polypeptides lacking one or more hydrophobic domains) 
and/or NHP fusion protein products (especially NHP-Ig 
fusion proteins, i.e., fusions of a NHP domain to an IgFc), 
NHP antibodies, and anti-idiotypic antibodies (including 
Fab fragments) that can be used in therapeutic applications. 
Preferably, the above expression systems are engineered to 
allow the desired peptide or polypeptide to be recovered 
from the culture media. 

The present invention also encompasses antibodies and 
anti-idiotypic antibodies (including Fab fragments), antago- 
nists and agonists of a NHP, as well as compounds or 
nucleotide constructs that inhibit expression of a NHP gene 
(transcription factor inhibitors, antisense and ribozyme 
molecules, or gene or regulatory sequence replacement 
constructs), or promote the expression of a NHP (e.g., 
expression constructs in which NHP coding sequences are 
operatively associated with expression control elements 
such as promoters, promoter/enhancers, etc.). 

The NHPs or NHP peptides, NHP fusion proteins, NHP 
nucleotide sequences, antibodies, antagonists and agonists 
can be useful for the detection of mutant NHPs or inappro- 
priately expressed NHPs for the diagnosis of disease. The 
NHP proteins or peptides, NHP fusion proteins, NHP nucle- 
otide sequences, host cell expression systems, antibodies, 
antagonists, agonists and genetically engineered cells and 
animals can be used for screening for drugs (or high 
throughput screening of combinatorial libraries) effective in 
the treatment of the symptomatic or phenotypic manifesta- 
tions of perturbing the normal function of a NHP in the body. 
The use of engineered host cells and/or animals can offer an 
advantage in that such systems allow not only for the 
identification of compounds that bind to the endogenous 
receptor/ligand of a NHP, but can also identify compounds 
that trigger NHP-mediated activities or pathways. 



* 
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erate nature of the genetic code is well known, and, 
accordingly, each amino acid presented in the Sequence 
Listing, is generically representative of the well known 
nucleic acid "triplet" codon, or in many cases codons, that 
can encode the amino acid. As such, as contemplated herein, 
the amino acid sequences presented in the Sequence Listing, 
when taken together with the genetic code (see, for example, 
Table 4-1 at page 109 of "Molecular Cell Biology", 1986, J. 
Darnell et al. eds., Scientific American Books, New York, 
N.Y., herein incorporated by reference) are generically rep- 
resentative of all the various permutations and combinations 
of nucleic acid sequences that can encode such amino acid 
sequences. 

The invention also encompasses proteins that are func- 
tionally equivalent to the NHPs encoded by the presently 
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Finally, the NHP products can be used as therapeutics. For 
example, soluble derivatives such as NHP peptides/domains 
corresponding to NHPs, NHP fusion protein products 
(especially NHP-Ig fusion proteins, i.e., fusions of a NHP, or 
a domain of a NHP, to an IgFc), NHP antibodies and 
anti-idiotypic antibodies (including Fab fragments), antago- 
nists or agonists (including compounds that modulate or act 
on downstream targets in a NHP- mediated pathway) can be 
used to directly treat diseases or disorders. For instance, the 
administration of an effective amount of soluble NHP, or a 
NHP-IgFc fusion protein or an anti-idiotypic antibody (or its 
Fab) that mimics the NHP could activate or effectively 
antagonize the endogenous NHP or a protein interactive 
therewith. Nucleotide constructs encoding such NHP prod- 
ucts can be used to genetically engineer host cells to express 
such products in vivo; these genetically engineered cells described nucleotide sequences as judged by any of a 
function as "bioreactors" in the body delivering a continuous number of criteria, including, but not limited to, the ability 
supply of a NHP, a NHP peptide, or a NHP fusion protein to 10 bind and modify a NHP substrate, or the ability to effect 
the body. Nucleotide constructs encoding functional NHPs, a ° identical or complementary downstream pathway, or a 
mutant NHPs, as well as antisense and ribozyme molecules 20 change in cellular metabolism (e.g., proteolytic activity, ion 
can also be used in "gene therapy" approaches for the flux, tyrosine phosphorylation, etc.). Such functionally 
modulation of NHP expression. Thus, the invention also equivalent NHP proteins include, but are not limited to, 
encompasses pharmaceutical formulations and methods for additions or substitutions of amino acid residues within the 
treating biological disorders. amino acid sequence encoded by a NHP nucleotide sequence 

25 described above, but which result in a silent change, thus 
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Various aspects of the invention are described in greater 
detail in the subsections below. 

5.1 The NHP Sequences 

The cDNA sequences and corresponding deduced amino 
acid sequences of the described NHPs are presented in the 
Sequence Listing. 

Expression analysis has provided evidence that the 
described NHPs can be expressed in human tissues as well 
as gene trapped human cells. In addition to GRKs, the 
described NHPs share significant similarity to a range of 
kinase families from a variety of phyla and species. Similar 
polynucleotides encoding GRK proteins, as well as uses and 
applications that are germane to the described NHPs are 
described in U.S. Pat. Nos. 5,591,618 and 5,532,151 which 
are herein incorporated by reference in their entirety. 

5.2 NHPS and NHP Polypeptides 

NHP products, polypeptides, peptide fragments, mutated, 
truncated, or deleted forms of the NHPs, and/or NHP fusion 
proteins can be prepared for a variety of uses. These uses 
include, but are not limited to, the generation of antibodies, 
as reagents in diagnostic assays, the identification of other 
cellular gene products related to the NHP, as reagents in 



producing a functionally equivalent gene product. Amino 
acid substitutions may be made on the basis of similarity in 
polarity, charge, solubility, hydrophobicity, hydrophilicity, 
and/or the amphipathic nature of the residues involved. For 
30 example, nonpolar (hydrophobic) amino acids include 
alanine, leucine, isoleucine, valine, proline, phenylalanine, 
tryptophan, and methionine; polar neutral amino acids 
include glycine, serine, threonine, cysteine, tyrosine, 
asparagine, and glutamine; positively charged (basic) amino 
35 acids include arginine, lysine, and histidine; and negatively 
charged (acidic) amino acids include aspartic acid and 
glutamic acid. 

A variety of host-expression vector systems can be used 
to express the NHP nucleotide sequences of the invention. 
40 Where the NHP peptide or polypeptide can exist, or has been 
engineered to exist, as a soluble or secreted molecule, the 
soluble NHP peptide or polypeptide can be recovered from 
the culture media. Such expression systems also encompass 
engineered host cells that express a NHP, or functional 
45 equivalent, in situ. Purification or enrichment of a NHP from 
such expression systems can be accomplished using appro- 
priate detergents and lipid micelles and methods well known 
to those skilled in the art. However, such engineered host 
cells themselves may be used in situations where it is 
assays for screening for compounds that can be used as 50 important not only to retain the structural and functional 



pharmaceutical reagents useful in the therapeutic treatment 
of mental, biological, or medical disorders and disease. 

The Sequence Listing discloses the amino acid sequences 
encoded by the described NHP-encoding polynucleotides. 
The NHPs display initiator methionines in a DNA sequence 
context consistent with eucaryotic translation initiation site, 
and a weak signal sequence characteristic of membrane 
associated proteins. 

The NHP amino acid sequences of the invention include 
the amino acid sequences presented in the Sequence Listing 
as well as analogues and derivatives thereof. Further, cor- 
responding NHP homologues from other species are encom- 
passed by the invention. In fact, any NHP protein encoded 
by the NHP nucleotide sequences described above are within 



55 



60 



characteristics of the NHP, but to assess biological activity, 
e.g., in drug screening assays. 

The expression systems that may be used for purposes of 
the invention include but are not limited to microorganisms 
such as bacteria (e.g., E. coli, B, subtilis) transformed with 
recombinant bacteriophage DNA, plasmid DNA or cosmid 
DNA expression vectors containing NHP nucleotide 
sequences; yeast (e.g., Saccharomyces, Pichia) transformed 
with recombinant yeast expression vectors containing NHP 
nucleotide sequences; insect cell systems infected with 
recombinant virus expression vectors (e.g., baculovirus) 
containing NHP sequences; plant cell systems infected with 
recombinant virus expression vectors (e.g., cauliflower 
mosaic virus, CaMV; tobacco mosaic virus, TMV) or trans- 



the scope of the invention, as are any novel polynucleotide 65 formed with recombinant plasmid expression vectors (e.g., 
sequences encoding all or any novel portion of an amino Ti plasmid) containing NHP nucleotide sequences; or mam- 
acid sequence presented in the Sequence Listing. The degen- malian cell systems (e.g., COS, CHO, BHK, 293, 3T3) 
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harboring recombinant expression constructs containing entire insert. These exogenous translational control signals 

promoters derived from the genome of mammalian cells and initiation codons can be of a variety of origins, both 

(e.g., metallothionein promoter) or from mammalian viruses natural and synthetic. The efficiency of expression may be 

(e.g., the adenovirus late promoter; the vaccinia virus 7.5K enhanced by the inclusion of appropriate transcription 

promoter). 5 enhancer elements, transcription terminators, etc. (See Bitter 

In bacterial systems, a number of expression vectors may et al., 1987, Methods in Enzymol. 153:516-544). 
be advantageously selected depending upon the use intended In addition, a host cell strain may be chosen that modu- 
for the NHP product being expressed. For example, when a lates the expression of the inserted sequences, or modifies 
large quantity of such a protein is to be produced for the and processes the gene product in the specific fashion 
generation of pharmaceutical compositions of or containing 10 desired. Such modifications (e.g., glycosylation) and pro- 
NHP, or for raising antibodies to a NHP, vectors that direct cessing (e.g., cleavage) of protein products may be impor- 
the expression of high levels of fusion protein products that tant for the function of the protein. Different host cells have 
are readily purified may be desirable. Such vectors include, characteristic and specific mechanisms for the post- 
but are not limited, to the E. coli expression vector pUR278 translational processing and modification of proteins and 
(Ruther et al., 1983, EMBO J. 2:1791), in which a NHP 15 gene products. Appropriate cell lines or host systems can be 
coding sequence may be ligated individually into the vector chosen to ensure the correct modification and processing of 
in frame with the lacZ coding region so that a fusion protein the foreign protein expressed. To this end, eukaryotic host 
is produced; pIN vectors (Inouye & Inouye, 1985, Nucleic cells which possess the cellular machinery for proper pro- 
Acids Res. 13:3101-3109; Van Heeke & Schuster, 1989, J. cessing of the primary transcript, glycosylation, and phos- 
Biol. Chem. 264:5503-5509); and the like. pGEX vectors 2 o phorylation of the gene product may be used. Such mam- 
may also be used to express foreign polypeptides as fusion malian host cells include, but are not limited to, CHO, 
proteins with glutathione S-transferase (GST). In general, VERO, BHK, HeLa, COS, MDCK, 293, 3T3, WI38, and in 
such fusion proteins are soluble and can easily be purified particular, human cell lines. 

from lysed cells by adsorption to glutathione-agarose beads For long-term, high -yield production of recombinant 

followed by elution in the presence of free glutathione. The 2 5 proteins, stable expression is preferred. For example, cell 

PGEX vectors are designed to include thrombin or factor Xa lines that stably express the NHP sequences described above 

protease cleavage sites so that the cloned target gene product can be engineered. Rather than using expression vectors 

can be released from the GST moiety. which contain viral origins of replication, host cells can be 

In an insect system, Autographa califormica nuclear transformed with DNA controlled by appropriate expression 

polyhidrosis virus (AcNPV) is used as a vector to express 30 control elements (e.g., promoter, enhancer sequences, tran- 

foreign polynucleotides. The virus grows in Spodoptera scription terminators, polyadenylation sites, etc.), and a 

frugiperda cells. A NHP encoding polynucleotide sequence selectable marker. Following the introduction of the foreign 

can be cloned individually into non-essential regions (for DNA, engineered cells may be allowed to grow for 1-2 days 

example the polyhedrin gene) of the virus and placed under in an enriched media, and then are switched to a selective 

control of an AcNPV promoter (for example the polyhedrin 35 media. The selectable marker in the recombinant plasmid 

promoter). Successful insertion of NHP coding sequence confers resistance to the selection and allows cells to stably 

will result in inactivation of the polyhedrin gene and pro- integrate the plasmid into their chromosomes and grow to 

duction of non-occluded recombinant virus (i.e., virus lack- form foci which in turn can be cloned and expanded into cell 

ing the proteinaceous coat coded for by the polyhedrin fines. This method may advantageously be used to engineer 

gene). These recombinant viruses are then used to infect 40 cell lines which express the NHP product. Such engineered 

Spodoptera frugiperda cells in which the inserted polynucle- cell lines may be particularly useful in screening and evalu- 

otide is expressed (e.g., see Smith et al., 1983, J. Virol. 46: ation of compounds that affect the endogenous activity of the 

584; Smith, U.S. Pat. No. 4,215,051). NHP product. 

In mammalian host cells, a number of viral-based expres- A number of selection systems can be used, including but 
sion systems may be utilized. In cases where an adenovirus 45 not limited to the herpes simplex virus thymidine kinase 
is used as an expression vector, the NHP nucleotide (Wigler, et al., 1977, Cell 11:223), hypoxanthine-guanine 
sequence of interest may be ligated to an adenovirus phosphoribosyl transferase (Szybalska & Szybalski, 1962, 
transcription/translation control complex, e.g., the late pro- Proc. Natl. Acad, Sci. USA 48:2026), and adenine phospho- 
moter and tripartite leader sequence. This chimeric gene can ribosyltransferase (Lowy, et al, 1980, Cell 22:817) genes 
then be inserted in the adenovirus genome by in vitro or in 50 can be employed in tk", hgprt" or aprt" cells, respectively, 
vivo recombination. Insertion in a non-essential region of Also, antimetabolite resistance can be used as the basis of 
the viral genome (e.g., region El or E3) will result in a selection for the following genes: dhfr, which confers resis- 
recombinant virus that is viable and capable of expressing a tance to methotrexate (Wigler, et al., 1980, Natl. Acad. Sci. 
NHP product in infected hosts (e.g., See Logan & Shenk, USA 77:3567; O'Hare, et al., 1981, Proc. Natl. Acad. Sci. 
1984, Proc. Natl. Acad. Sci, USA 81:3655-3659). Specific 55 USA 78:1527); gpt, which confers resistance to mycophe- 
initiation signals may also be required for efficient transla- nolic acid (Mulligan & Berg, 1981, Proc. Natl. Acad. Sci. 
tion of inserted NHP nucleotide sequences. These signals USA 78:2072); neo, which confers resistance to the ami- 
include the ATG initiation codon and adjacent sequences. In noglycoside G-418 (Colberre-Garapin, et al., 1981, J. Mol. 
cases where an entire NHP gene or cDNA, including its own Biol. 150:1); and hygro, which confers resistance to hygro- 
initiation codon and adjacent sequences, is inserted into the 60 mycin (Santerre, et al., 1984, Gene 30:147). 
appropriate expression vector, no additional translational Alternatively, any fusion protein can be readily purified 
control signals may be needed. However, in cases where by utilizing an antibody specific for the fusion protein being 
only a portion of a NHP coding sequence is inserted, expressed. For example, a system described by Janknecht et 
exogenous translational control signals, including, perhaps, al. allows for the ready purification of non-denatured fusion 
the ATG initiation codon, must be provided. Furthermore, 65 proteins expressed in human cell lines (Janknecht, et al., 
the initiation codon must be in phase with the reading frame 1991, Proc. Natl. Acad. Sci. USA 88:8972-8976). In this 
of the desired coding sequence to ensure translation of the system, the polynucleotide of interest is subcloned into a 
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vaccinia recombination plasmid such that the gene's open 
reading frame is translationally fused to an amino-terminal 
tag consisting of six histidine residues. Extracts from cells 
infected with recombinant vaccinia virus are loaded onto 
Ni 2 * nitriloacetic acid-agarose columns and histidine-tagged 
proteins are selectively eluted with imidazole-containing 
buffers. 

Also encompassed by the present invention are fusion 
proteins that direct the NHP to a target organ and/or facilitate 
transport across the membrane into the cytosol. Conjugation 
of NHPs to antibody molecules or their Fab fragments could 
be used to target cells bearing a particular epitope. Attaching 
the appropriate signal sequence to the NHP would also 
transport the NHP to the desired location within the cell. 
Alternatively targeting of NHP or its nucleic acid sequence 
might be achieved using liposome or lipid complex based 
delivery systems. Such technologies are described in Lipo- 
somes: A Practical Approach, New, RRC ed., Oxford Uni- 
versity Press, New York and in U.S. Pat. Nos. 4,594,595, 
5,459,127, 5,948,767 and 6,110,490 and their respective 
disclosures which are herein incorporated by reference in 
their entirety. Additionally embodied are novel protein con- 
structs engineered in such a way that they facilitate transport 
of the NHP to the target site or desired organ, where they 
cross the cell membrane and/or the nucleus where the NHP 
can exert its functional activity. This goal may be achieved 
by coupling of the NHP to a cytokine or other ligand that 
provides targeting specificity, and/or to a protein transducing 
domain (see generally U.S. application Ser. No. 60/111,701 
and 60/056,713, both of which are herein incorporated by 
reference, for examples of such transducing sequences) to 
facilitate passage across cellular membranes and can option- 
ally be engineered to include nuclear localization sequences. 

5.3 Antibodies to NHP Products 

Antibodies that specifically recognize one or more 
epitopes of a NHP, or epitopes of conserved variants of a 
NHP, or peptide fragments of a NHP are also encompassed 
by the invention. Such antibodies include but are not limited 
to polyclonal antibodies, monoclonal antibodies (mAbs), 
humanized or chimeric antibodies, single chain antibodies, 
Fab fragments, F(ab') 2 fragments, fragments produced by a 
Fab expression library, anti-idiotypic (anti-Id) antibodies, 
and epitope-binding fragments of any of the above. 

The antibodies of the invention can be used, for example, 
in the detection of NHP in a biological sample and may, 
therefore, be utilized as part of a diagnostic or prognostic 
technique whereby patients may be tested for abnormal 
amounts of NHP. Such antibodies may also be utilized in 
conjunction with, for example, compound screening 
schemes for the evaluation of the effect of test compounds 
on expression and/or activity of a NHP gene product. 
Additionally, such antibodies can be used in conjunction 
gene therapy to, for example, evaluate the normal and/or 
engineered NHP-expressing cells prior to their introduction 
into the patient. Such antibodies may additionally be used as 
a method for the inhibition of abnormal NHP activity. Thus, 
such antibodies may, therefore, be utilized as part of treat- 
ment methods. 

For the production of antibodies, various host animals 
may be immunized by injection with the NHP, a NHP 
peptide (e.g., one corresponding to a functional domain of a 
NHP), truncated NHP polypeptides (NHP in which one or 
more domains have been deleted), functional equivalents of 
the NHP or mutated variant of the NHP. Such host animals 
may include but are not limited to pigs, rabbits, mice, goats, 
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and rats, to name but a few. Various adjuvants may be used 
to increase the immunological response, depending on the 
host species, including but not limited to Freund's adjuvant 
(complete and incomplete), mineral salts such as aluminum 

5 hydroxide or aluminum phosphate, surface active substances 
such as lysolecithin, pluronic polyols, polyanions, peptides, 
oil emulsions, and potentially useful human adjuvants such 
as BCG (bacille Calmette-Guerin) and Corynebacterium 
parvum. Alternatively, the immune response could be 

1Q enhanced by combination and or coupling with molecules 
such as keyhole limpet hemocyanin, tetanus toxoid, dipthe- 
ria toxoid, ovalbumin, cholera toxin or fragments thereof. 
Polyclonal antibodies are heterogeneous populations of anti- 
body molecules derived from the sera of the immunized 

15 animals. 

Monoclonal antibodies, which are homogeneous popula- 
tions of antibodies to a particular antigen, can be obtained by 
any technique which provides for the production of antibody 
molecules by continuous cell lines in culture. These include, 

2 q but are not limited to, the hybridoma technique of Kohler 
and Milstein, (1975, Nature 256:495^97; and U.S. Pat. No. 
4,376,110), the human B-cell hybridoma technique (Kosbor 
et al., 1983, Immunology Today 4:72; Cole et al., 1983, 
Proc. Natl. Acad. Sci. USA 80:2026-2030), and the EBV- 

25 hybridoma technique (Cole et al., 1985, Monoclonal Anti- 
bodies And Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). 
Such antibodies may be of any immunoglobulin class 
including IgG, IgM, IgE, IgA, IgD and any subclass thereof. 
The hybridoma producing the mAb of this invention may be 

30 cultivated in vitro or in vivo. Production of high titers of 
mabs in vivo makes this the presently preferred method of 
production. 

In addition, techniques developed for the production of 
"chimeric antibodies" (Morrison et al., 1984, Proc. Natl. 

35 Acad. Sci., 81:6851-6855; Neuberger et al., 1984, Nature, 
312:604-608; Takeda et al., 1985, Nature, 314:452-454) by 
splicing the genes from a mouse antibody molecule of 
appropriate antigen specificity together with genes from a 
human antibody molecule of appropriate biological activity 

40 can be used. A chimeric antibody is a molecule in which 
different portions are derived from different animal species, 
such as those having a variable region derived from a murine 
mAb and a human immunoglobulin constant region. Such 
technologies are described in U.S. Pat. Nos. 6,075,181 and 

45 5,877^97 and their respective disclosures which are herein 
incorporated by reference in their entirety. Also encom- 
passed by the present invention is the use of fully humanized 
monoclonal antibodies as described in U.S. Pat. No. 6,150, 
584 and respective disclosures which are herein incorpo- 

50 rated by reference in their entirety. 

Alternatively, techniques described for the production of 
single chain antibodies (U.S. Pat. No. 4,946,778; Bird, 1988, 
Science 242:423-426;Huston et al., 1988, Proc. Natl. Acad. 
Sci. USA 85:5879-5883; and Ward et al., 1989, Nature 

55 341:544—546) can be adapted to produce single chain anti- 
bodies against NHP gene products. Single chain antibodies 
are formed by linking the heavy and light chain fragments of 
the Fv region via an amino acid bridge, resulting in a single 
chain polypeptide. 

60 Antibody fragments which recognize specific epitopes 
may be generated by known techniques. For example, such 
fragments include, but are not limited to: the F(ab') 2 frag- 
ments which can be produced by pepsin digestion of the 
antibody molecule and the Fab fragments which can be 

65 generated by reducing the disulfide bridges of the F(ab') 2 
fragments. Alternatively, Fab expression libraries may be 
constructed (Huse et al., 1989, Science, 246:1275-1281) to 
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allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity. 

Antibodies to a NHP can, in turn, be utilized to generate 
anti-idiotype antibodies that "mimic" a given NHP, using 5 
techniques well known to those skilled in the art, (See, e.g., 
Greenspan & Bona, 1993, FASEB J 7(5):437^W4; and 
Nissinoff, 1991, J. Immunol. 147(8):2429-2438). For 
example antibodies which bind to a NHP domain and 
competitively inhibit the binding of NHP to its cognate io 
receptor/hgand can be used to generate anti-idiotypes that 
"mimic" the NHP and, therefore, bind, activate, or neutralize 
a NHP, NHP receptor, or NHP ligand. Such anti-idiotypic 
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antibodies or Fab fragments of such anti-idiotypes can be 
used in therapeutic regimens involving a NHP mediated 
pathway. 

The present invention is not to be limited in scope by the 
specific embodiments described herein, which are intended 
as single illustrations of individual aspects of the invention, 
and functionally equivalent methods and components are 
within the scope of the invention. Indeed, various modifi- 
cations of the invention, in addition to those shown and 
described herein will become apparent to those skilled in the 
art from the foregoing description. Such modifications are 
intended to fall within the scope of the appended claims. All 
cited publications, patents, and patent applications are herein 
incorporated by reference in their entirety. 



SEQUENCE LISTING 

<160> NUMBER OF SEQ ID NOS : 5 

<210> SEQ ID NO 1 

<211> LENGTH: 1662 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 1 



atggtggaca 


tgggggccct 


ggayaacctg 


atcgccaaca 


ccgcctacct 


gcaggcccgg 


60 


aagccctcgg 


actgcgacag 


caaagagctg 


cagcggcggc 


ggcgtagcct 


ggccctgccc 


120 


gggctgcagg 


gctgcgcgga 


gctccgccag 


aagctgtccc 


tgaacttcca 


cagcctgtgt 


180 


gagcagcagc 


ccatcggtcg 


ccgcctcttc 


cgtgacttcc 


tagccacagt 


gcccacgttc 


240 


cgcaaggcgg 


caaccttcct 


agaggacgtg 


cagaactggg 


agctggccga 


ggagggaccc 


300 


accaaagaca 


gcgcgctgca 


ggggctggtg 


gccacttgtg 


cgagtgcccc 


tgccccgggg 


360 


aacccgcaac 


ccttcctcag 


ccaggccgtg 


gccaccaagt 


gccaagcagc 


caccactgag 


420 


gaagagcgag 


tggctgcagt 


gacgctggcc 


aaggctgagg 


ccatggcttt 


cttgcaagag 


480 


cagccctt-ta 


aggatttcgt 


gaccagcgcc 


ttctacgaca 


agtttctgca 


gtggaaactc 


540 


ttcgagatgc 


aaccagtgtc 


agacaagtac 


ttcactgagt 


tcagagtgct 


ggggaaaggt 


600 


ggttttgggg 


aggtatgtgc 


cgtccaggtg 


aaaaacactg 


ggaagatgta 


tgcctgtaag 


660 


aaactggaca 


agaagcggct 


gaagaagaaa 


ggtggcgaga 


agatggctct 


cttggaaaag 


720 


gaaatcttgg 


agaaggtcag 


cagccctttc 


attgtctctc 


tggcctatgc 


ctttgagagc 


780 


aagacccatc 


tctgccttgt 


catgagcctg 


atgaatgggg 


gagacctcaa 


gttccacatc 


840 


tacaacgtgg 


gcacgcgtgg 


cctggacatg 


agccgggtga 


tcttttactc 


ggcccagata 


900 


gcctgtggga 


tgctgcacct 


ccatgaactc 


ggcatcgtct 


atcgggacat 


gaagcctgag 


960 


aatgtgcttc 


tggatgacct 


cggcaactgc 


aggttatctg 


acctggggct 


ggccgtggag 


1020 


atgaagggtg 


gcaagcccat 


cacccagagg 


gctggaacca 


atggttacat 


ggctcctgag 


1080 


atcctaatgg 


aaaaggtaag 


ttattcctat 


cctgtggact 


ggtttgccat 


gggatgcagc 


1140 


atttatgaaa 


tggttgctgg 


acgaacacca 


ttcaaagatt 


acaaggaaaa 


ggtcagtaaa 


1200 


gaggatctga 


agcaaagaac 


tctgcaagac 


gaggtcaaat 


tccagcatga 


taacttcaca 


1260 


gaggaagcaa 


aagatatttg 


caggctcttc 


ttggctaaga 


aaccagagca 


acgcttagga 


1320 


agcagagaaa 


agtctgatga 


tcccaggaaa 


catcatttct 


ttaaaacgat 


caactttcct 


1380 


cgcctggaag 


ctggcctaat 


tgaaccccca 


tttgtgccag 


acccttcagt 


ggtttatgcc 


1440 


aaagacatcg 


ctgaaattga 


tgatttctct 


gaggttcggg 


gggtggaatt 


tgatgacaaa 


1500 
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-continued 



gataagcagt tcttcaaaaa ctttgcgaca ggtgctgttc ctatagcatg gcaggaagaa 1560 
attatagaaa cgggactgtt tgaggaactg aatgacccca acagacctac gggttgtgag 1620 



<210> SEQ ID NO 2 

<211> LENGTH: 553 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 2 

Met Val Asp Met Gly Ala Leu Asp Asn Leu He Ala Asn Thr Ala Tyr 
15 10 15 

Leu Gin Ala Arg Lys Pro Ser Asp Cys Asp Ser Lys Glu Leu Gin Arg 
20 25 30 

Arg Arg Arg Ser Leu Ala Leu Pro Gly Leu Gin Gly Cys Ala Glu Leu 
35 40 45 

Arg Gin Lys Leu Ser Leu Asn Phe His -Ser Leu Cys Glu Gin Gin Pro 
50 55 60 

He Gly Arg Arg Leu Phe Arg Asp Phe Leu Ala Thr Val Pro Thr Phe 
65 70 75 80 

Arg Lys Ala Ala Thr Phe Leu Glu Asp Val Gin Asn Trp Glu Leu Ala 
85 90 95 

Glu Glu Gly Pro Thr Lys Asp Ser Ala Leu Gin Gly Leu Val Ala Thr 
100 105 110 

Cys Ala Ser Ala Pro Ala Pro Gly Asn Pro Gin Pro Phe Leu Ser Gin 
115 120 125 

Ala Val Ala Thr Lys Cys Gin Ala Ala Thr Thr Glu Glu Glu Arg Val 
130 135 140 

Ala Ala Val Thr Leu Ala Lys Ala Glu Ala Met Ala Phe Leu Gin Glu 
145 150 155 160 

Gin Pro Phe Lys Asp Phe Val Thr Ser Ala Phe Tyr Asp Lys Phe Leu 
165 170 175 

Gin Trp Lys Leu Phe Glu Met Gin Pro Val Ser Asp Lys Tyr Phe Thr 
180 185 190 

Glu Phe Arg Val Leu Gly Lys Gly Gly Phe Gly Glu Val Cys Ala Val 
195 200 205 

Gin Val Lys Asn Thr Gly Lys Met Tyr Ala Cys Lys Lys Leu Asp Lys 
210 215 220 

Lys Arg Leu Lys Lys Lys Gly Gly Glu Lys Met Ala Leu Leu Glu Lys 
225 230 235 240 

Glu He Leu Glu Lys Val Ser Ser Pro Phe He Val Ser Leu Ala Tyr 
245 250 255 

Ala Phe Glu Ser Lys Thr His Leu Cys Leu Val Met Ser Leu Met Asn 
260 265 270 

Gly Gly Asp Leu Lys Phe His He Tyr Asn Val Gly Thr Arg Gly Leu 
275 280 285 

Asp Met Ser Arg Val He' Phe Tyr Ser Ala Gin He Ala Cys Gly Met 
290 295 300 

Leu His Leu His Glu Leu Gly He Val Tyr Arg Asp Met Lys Pro Glu 
305 310 315 320 

Asn Val Leu Leu Asp Asp Leu Gly Asn Cys Arg Leu Ser Asp Leu Gly 
325 330 335 

Leu Ala Val Glu Met Lys Gly Gly Lys Pro He Thr Gin Arg Ala Gly 



gagggtaatt catccaagtc tggcgtgtgt ttgttattgt aa 



1662 



US 6,444,456 Bl 
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Thr Asn Gly Tyr Met Ala Pro Glu lie Leu Met Glu Lys Val Ser Tyr 
355 360 365 

Ser Tyr Pro Val Asp Trp Phe Ala Met Gly Cys Ser He Tyr Glu Met 
370 375 380 

Val Ala Gly Arg Thr Pro Phe Lye Asp Tyr Lye Glu Lys Val Ser Lys 
385 390 395 400 

Glu Asp Leu Lys Gin Arg Thr Leu Gin Asp Glu Val Lys Phe Gin His 
405 410 415 

Asp Asn Phe Thr Glu Glu Ala Lys Asp He Cys Arg Leu Phe Leu Ala 
420 425 430 

Lys Lys Pro Glu Gin Arg Leu Gly Ser Arg Glu Lys Ser Asp Asp Pro 
435 440 445 

Arg Lys His His Phe Phe Lys Thr He Asn Phe Pro Arg Leu Glu Ala 
450 455 460 

Gly Leu He Glu Pro Pro Phe Val Pro Asp Pro Ser Val Val Tyr Ala 
465 470 475 480 

Lys Asp He Ala Glu He Asp Asp Phe Ser Glu Val Arg Gly Val Glu 
485 490 495 

Phe Asp Asp Lys Asp Lys Gin Phe Phe Lys Asn Phe Ala Thr Gly Ala 
500 505 510 

Val Pro He Ala Trp Gin Glu Glu He He Glu Thr Gly Leu Phe Glu 
515 520 525 

Glu Leu Asn Asp Pro Asn Arg Pro Thr Gly Cys Glu Glu Gly Asn Ser 
530 535 540 

Ser Lys Ser Gly Val Cys Leu Leu Leu 



<210> SEQ ID NO 3 

<211> LENGTH: 1062 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 3 

atggtggaca tgggggccct ggacaacctg atcgccaaca ccgcctacct gcaggcccgg 60 

aagccctcgg actgcgacag caaagagctg cagcggcggc ggcgtagcct ggccctgccc 120 

gggctgcagg gctgcgcgga gctccgccag aagctgtccc tgaacttcca cagcctgtgt 180 

gagcagcagc ccatcggtcg ccgcctcttc cgtgacttcc tagccacagt gcccacgttc 240 

cgcaaggcgg caaccttcct agaggacgtg cagaactggg agctggccga ggagggaccc 300 

accaaagaca gcgcgctgca ggggctggtg gccacttgtg cgagtgcccc tgccccgggg 360 

aacccgcaac ccttcctcag ccaggccgtg gccaccaagt gccaagcagc caccactgag 420 

gaagagcgag tggctgcagt gacgctggcc aaggctgagg ccatggcttt cttgcaagag 480 

cagcccttta aggatttcgt gaccagcgcc ttctacgaca agtttctgca gtggaaactc 540 

ttcgagatgc aaccagtgtc agacaagtac ttcactgagt tcagagtgct ggggaaaggt 600 

ggttttgggg aggtatgtgc cgtccaggtg aaaaacactg ggaagatgta tgcctgtaag 660 

aaactggaca agaagcggct gaagaagaaa ggtggcgaga agatggctct cttggaaaag 720 

gaaatcttgg agaaggtcag cagccctttc attgtctctc tggcctatgc ctttgagagc 780 

aagacccatc tctgccttgt catgagcctg atgaatgggg gagacctcaa gttccacatc 840 

tacaacgtgg gcacgcgtgg cctggacatg agccgggtga tcttttactc ggcccagata 900 
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-continued 



gcctgtggga tgctgcacct ccatgaactc ggcatcgtct 



atcgggacat gaagcctgag 



960 



aatgtgcttc tggatgacct cggcaactgc aggttatctg acctggggct ggccgtggag 1020 



<210> SEQ ID NO 4 

<211> LENGTH: 353 

<212> TYPE: PRT 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 4 

Met Val Asp Met Gly Ala Leu Asp Asn Leu He Ala Asn Thr Ala Tyr 
15 10 15 

Leu Gin Ala Arg Lys Pro Ser Asp Cys Asp Ser Lys Glu Leu Gin Arg 
20 25 30 

Arg Arg Arg Ser Leu Ala Leu Pro Gly Leu Gin Gly Cys Ala Glu Leu 
35 40 45 

Arg Gin Lys Leu Ser Leu Asn Phe His Ser Leu Cys Glu Gin Gin Pro 
50 55 60 

He Gly Arg Arg Leu Phe Arg Asp Phe Leu Ala Thr Val Pro Thr Phe 
65 70 75 80 

Arg Lys Ala Ala Thr Phe Leu Glu Asp Val Gin Asn Trp Glu Leu Ala 
85 90 95 

Glu Glu Gly Pro Thr Lys Asp Ser Ala Leu Gin Gly Leu Val Ala Thr 
100 105 110 

Cys Ala Ser Ala Pro Ala Pro Gly Asn Pro Gin Pro Phe Leu Ser Gin 
115 120 125 

Ala Val Ala Thr Lys Cys Gin Ala Ala Thr Thr Glu Glu Glu Arg Val 
130 135 140 

Ala Ala Val Thr Leu Ala Lys Ala Glu Ala Met Ala Phe Leu Gin Glu 
145 150 155 160 

Gin Pro Phe Lys Asp Phe Val Thr Ser Ala Phe Tyr Asp Lys Phe Leu 
165 170 175 

Gin Trp Lys Leu Phe Glu Met Gin Pro Val Ser Asp Lys Tyr Phe Thr 
180 185 190 

Glu Phe Arg Val Leu Gly Lys Gly Gly Phe Gly Glu Val Cys Ala Val 
195 200 205 

Gin Val Lys Asn Thr Gly Lys Met Tyr Ala Cys Lys Lys Leu Asp Lys 
210 215 220 

Lye Arg Leu Lys Lys Lys Gly Gly Glu Lys Met Ala Leu Leu Glu Lys 
225 230 235 240 

Glu He Leu Glu Lys Val Ser Ser Pro Phe He Val Ser Leu Ala Tyr 
245 250 255 

Ala Phe Glu Ser Lys Thr His Leu Cys Leu Val Met Ser Leu Met Asn 
260 265 270 

Gly Gly Asp Leu Lys Phe His He Tyr Asn Val Gly Thr Arg Gly Leu 
275 280 285 

Asp Met Ser Arg Val He Phe Tyr Ser Ala Gin He Ala Cys Gly Met 
290 295 300 

Leu His Leu His Glu Leu Gly He Val Tyr Arg Asp Met Lys Pro Glu 
305 310 315 320 

Asn Val Leu Leu Asp Asp Leu Gly Asn Cys Arg Leu Ser Asp Leu Gly 
325 330 335 

Leu Ala Val Glu Met Lys Gly Gly Lye Pro He Thr Gin Arg Arg Lys 



atgaagggtg gcaagcccat cacccagagg agaaaagtct 



ga 



1062 



340 



345 



350 
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Val 

<210> SEQ ID NO 5 

<211> LENGTH: 2249 

<212> TYPE: DNA 

<213> ORGANISM: homo sapiens 

<400> SEQUENCE: 5 



aaaactgctc 


tgaggccatc 


atgctttgag 


gaagcccagg 


aggaaacact 


gcagagaggc 


60 


tcaaccaccc 


cagctctccc 


agctgagctc 


agccacccac 


cgatccccca 


getgaatgea 


120 


accataagag 


tgagtccagg 


ttctaccctg 


ctaggctgcc 


accacattcc 


taagaaccac 


180 


gggaaaaggc 


atttgctcct 


ccgaagaaat 


tctcagactg 


atttttcact 


gtattgtcag 


240 


gccacaggac 


tcactgtaaa 


tcccttggac 


gttgtctcac 


ccgggaaggg 


aaagcageca 


300 


gcagccctcc 


agccctcttg 


tgctttccct 


gggagtgcgc 


cccgtgctca 


gccatggtgg 


360 


acatgggggc 


cctggacaac 


ctgatcgcca 


acaccgccta 


cctgcaggcc 


cggaagccct 


420 


cggactgcga 


cagcaaagag 


ctgcagcggc 


ggcggcgtag 


cctggccctg 


cccgggctgc 


480 


agggctgcgc 


ggagctccgc 


cagaagctgt 


ccctgaactt 


ccacagcctg 


tgtgagcagc 


540 


agcccatcgg 


tcgccgcctc 


ttccgtgact 


tcctagccac 


agtgcccacg 


ttcegcaagg 


600 


cggcaacctt 


cctagaggac 


gtgcagaact 


gggagctggc 


cgaggaggga 


cccaccaaag 


660 


acagcgcgct 


gcaggggctg 


gtggccactt 


gtgcgagtgc 


ccctgccccg 


gggaacccgc 


720 


aacccttcct 


cagccaggcc 


gtggccacca 


agtgccaagc 


agccaccact 


gaggaagagc 


780 


gagtggctgc 


agtgacgctg 


gccaaggctg 


aggccatggc 


tttcttgcaa 


gagcagccct 


840 


ttaaggattt 


cgtgaccagc 


gccttctacg 


acaagtttct 


gcagtggaaa 


ctcttcgaga 


900 


tgcaaccagt 


gtcagacaag 


tacttcactg 


agttcagagt 


gctggggaaa 


ggtggttttg 


960 


gggaggtatg 


tgccgtccag 


gtgaaaaaca 


ctgggaagat 


gtatgcctgt 


aagaaactgg 


1020 


acaagaagcg 


gctgaagaag 


aaaggtggcg 


agaagatggc 


tctcttggaa 


aaggaaatct 


1080 


tggagaaggt 


cagcagccct 


ttcattgtct 


ctctggccta 


tgcctttgag 


agcaagaccc 


1140 


atctctgcct 


tgtcatgagc 


ctgatgaatg 


ggggagacct 


caagttccac 


atctacaacg 


1200 




tggcctggac 


atgagccggg 


tgatctttta 


etc ggcccag 


atagect gtg 


1260 


ggatgctgca 


cctccatgaa 


ctcggcatcg 


tctatcggga 


catgaagect 


gagaatgtgc 


1320 


ttctggatga 


cctcggcaac 


tgcaggttat 


ctgacctggg 


gctggccgtg 


gagatgaagg 


1380 


gtggcaagcc 


catcacccag 


agggctggaa 


ccaatggtta 


catggctcct 


gagatcctaa 


1440 


tggaaaaggt 


aagttattcc 


tatcctgtgg 


actggtttgc 


catgggatgc 


agcatttatg 


1500 


aaatggttgc 


tggacgaaca 


ccattcaaag 


attacaagga 


aaaggtcagt 


aaagaggatc 


1560 


tgaagcaaag 


aactctgcaa 


gacgaggtca 


aattccagca 


tgataacttc 


acagaggaag 


1620 


caaaagatat 


ttgcaggctc 


ttcttggcta 


agaaaccaga 


geaaegctta 


ggaagcagag 


1680 


aaaagtctga 


tgatcccagg 


aaacatcatt 


tctttaaaac 


gatcaacttt 


cctcgcctgg 


1740 


aagctggcct 


aattgaaccc 


ccatttgtgc 


cagacccttc 


agtggtttat 


gecaaagaca 


1800 


tcgctgaaat 


tgatgatttc 


tctgaggttc 


ggggggtgga 


atttgatgac 


aaagataagc 


1860 


agttcttcaa 


aaactttgcg 


acaggtgctg 


ttcctatagc 


atggcaggaa 


gaaattatag 


1920 


aaacgggact 


gtttgaggaa 


ctgaatgacc 


ccaacagacc 


tacgggttgt 


gaggagggta 


1980 


attcatccaa 


gtctggcgtg 


tgtttgttat 


tgtaaattgc 


tctctttacc 


agacaggcag 


2040 
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26 


caggagtctc 


ggctgacata atcctcgaat 


gttccacacg tggaaatctg tggaatgagg 


2100 




gctaatcagt 


taggagggac atcacaacca 


caaaacaatt caaaagacag gcaagctcac 


2160 




tactagaaca 


cattttattt tctttttctt 


tcttcataaa gatgagtaaa gtctcagttt 


2220 




tcactgaggg 


cagggaaaag gaacactca 




2249 





What is claimed is: 

1. An isolated nucleic acid molecule comprising the 
nucleotide sequence disclosed in SEQ ID NO: 1. 

2. An isolated nucleic acid molecule comprising a nucle- 
otide sequence that: 

(a) encodes the amino acid sequence shown in SEQ ID 
NO: 2; and 

(b) hybridizes under highly stringent conditions to the 
polynucleotide sequence of SEQ ID NO: 1 or the 
complete complement thereof. 



3. An isolated nucleic acid molecule comprising a nucle- 
otide sequence encoding the amino acid sequence shown in 
SEQ ID NO:2. 

4. An isolated nucleic acid molecule comprising a nucle- 
otide sequence encoding the amino acid sequence shown in 
SEQ ID NO:4. 
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(57) ABSTRACT 

According to the present invention an in-line compounding/ 
extrusion deposition and molding apparatus and method of 
using the same are provided. The apparatus comprises a 
single step compounding and extrusion apparatus which 
includes an extruder screw. The apparatus includes a first 
zone, a second zone, and a third zone. The first zone is used 
to melt an inlet material before the screw advances the 
melted inlet material into the second zone which comprises 
a preparation and cutting zone. Simultaneously, as the inlet 
material is melted in the first zone, the screw rotation feeds 
a reinforcing fiber bundle into the second zone where the 
reinforcing fiber bundle is prepared for melt impregnation 
and is sheared to a desired length. While in the second zone, 
mixing begins between the melted inlet material and the 
sheared reinforcing fiber bundle. Next the mixture is 
advanced into the third zone for uniform distributive mixing 
and impregnation of the sheared reinforcing fiber bundle 
with the melted inlet material to form a fiber bundle filled 
melt. The apparatus includes at least one winding/unwinding 
reel which continuously ensures that the reinforcing fiber 
bundle is under constant tension (no sagging or breaking 
thereof) during the X, Y, Z movement of the apparatus 
during melt deposition as well as during forward movement 
of the screw within the barrel. 

16 Claims, 5 Drawing Sheets 
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IN-LINE COMPOUNDING/EXTRUSION the continuous reinforcing fiber into the reciprocating single 

DEPOSITION AND MOLDING APPARATUS screw injection unit and to sever and uniformly impregnate 

AND METHOD OF USING THE SAME the fine filaments with resin keeping maximum fiber length 

in the part. 

5 In accordance with another aspect of the preferred 

The present invention relates generally to an apparatus embodiment of the present invention the apparatus includes 

and method of manufacturing a resin structure reinforced winding/unwinding reels and guides which prevent the 

with long fibers and, more particularly, to an apparatus and reinforcing fibers from sagging and breaking during a melt 

method of manufacturing for a single-step in-line com- deposition step or during forward and rear movement of the 

pounding of a reinforcing fiber with extrusion compression 10 reciprocating single screw injection unit, 

molding. The above and other objects and advantages of the 

BACKGROUND OF THE INVENTION *J ventiDn wU1 a P pareDt &0m ftiam f* 

the accompanying drawings and the appended claims. 

Elongated resin structures reinforced with fibers in which 

thermoplastic resins are reinforced with continuous fibers 35 BRIEF DESCRIPTION OF THE DRAWINGS 

have mechanical properties superior to those structures Referring now to the drawings wherein like elements are 

reinforced with short fibers. Such structures are beneficial numbered ihkc in the seV eral Figures: 

because they can be cut and formed into pellets or similar _ T _ „ . , . , . , . 

materials. Elongated thermoplastic resin structures rein- F U IG ;. 1 K a f°ss-sectional side elevaUonal view of one 

forced with fibers are generally manufactured by the 20 embodiment of an apparatus of the present invention m a 

so-called pultrusion method by impregnating a thermoplas- rst position, 

tic resin into a continuous reinforcement fiber bundle while FIG - } ^ a cross-sectional side view of the apparatus of 

the bundle is passed through a cross-head extrusion die, after 1 in a second position; 

which the resin-impregnated fiber bundle is drawn out FIG. 3 is a cross-sectional side elevational view of the 

through a die. After undergoing the pultrusion method, the 25 apparatus of FIG. 1 in a third position; 

structures are cut to a desired size. fig, 4 is a cross-sectional side elevational view of the 

Other processes are used to produce elongated thermo- apparatus of FIG. 1 in a fourth position; and 

plastic resin structures reinforced with fiber, for example, F IG. 5 is a cross-sectional side elevational view of the 

where first the plastic is melted in a long single screw 3Q apparatus of FIG. 1 in a fifth position, 
extruder which is fed to another single screw extruder. Next 

chopped strands are fed into the melt, and the reinforcing DETAILED DESCRIPTION OF THE 

fiber melt is pumped into an accumulator after which the PREFERRED EMBODIMENT 

required log size is cut and fed into a vertical molding press. A typical embodiment of an in-line compounding/ 

Currently, there is no apparatus available employing a 35 extrusion apparatus 10 embodying the present invention is 

single-step process that utilizes a single screw extruder as a sn0 wn in FIGS. 1-5. The apparatus 10 has a barrel 12 which 

reinforcing fiber compounder and melt deposition unit. in a preferred embodiment is cylindrical in shape. It wiU be 

Likewise, there is no method currently available where appreciated that barrel 12 may have other shapes. Barrel 12 

reinforcing fibers are fed into a barrel of the apparatus such includes an internal cavity 14 formed therein and extending 

that the fibers are constantly maintained in a stretched 40 a i ong a longitudinal axis thereof. The barrel 12 has a first 

condition regardless of the movement of the apparatus, e nd 16, an opposite second end 18 and an outer surface 20. 

thereby eliminating the possibility of fiber entanglement. jhe internal cavity 14 extends from the first end 16 to the 

SUMMARY OF THE INVENTION second end 18 of the barrel 12. Located at the second end 18 

of the barrel 12 is-a die 22 including a blade 23 that opens 

It is, therefore desirable to provide an apparatus and 45 and closes during operation of the apparatus 10 wherein the 

process for in-line compounding of reinforcing fiber and blade 23 is initially closed. The internal cavity 14 is pref- 

molding in a single-step process. A reinforcing product is erably cylindrical in shape and has a diameter great enough 

compounded by use of a reciprocating single screw extruder to permit a screw 24 to be disposed therein. In the preferred 

having a reinforcing fiber compounder and a melt depositing embodiment, the screw 24 is sized to fit tightly in the internal 

unit, where the reinforcing fibers are severed at a maximum 50 cavity 14. In other words, the widest diameter of the screw 

desirable length and kept in a stretched tensioned condition 24 is slightly less than the inner diameter of the internal 

regardless of the apparatus positioning such that there are no cavity 14. The barrel 12 preferably includes a beveled 

loose or sagging fibers during the process. portion 19 which comprises an annular beveled surface 

Advantageously, the in-line compounding/extrusion which has a complementary shape as the head 30 so that the 

apparatus of the present invention allows for the in-line 55 beveled portion 19 acts as a stop for the screw 24 as the 

compounding of reinforcing fibers with extrusion compres- screw 24 is driven in a direction towards the second end 18. 

sion molding in a single step by utilizing a reciprocating Beveled portion 19 includes and defines a central opening 

single screw injection unit. Typically, the in-fine compound- 31. 

ing of reinforcing fibers with extrusion compression mold- The screw 24 includes a first end 26 and an opposing 

ing would involve high cost, bulky equipment consisting of 60 second end 28. A head 30 is provided at the second end 28 

combinations of single screw, twin screw and plunger depo- of the screw 24. When the screw 24 is inserted into the 

sition assemblies. By using a single-step process for the internal cavity 14 of the barrel 12, the head 30 is inserted 

in-line compounding of reinforcing fibers and extrusion into the internal cavity 14 at the first end 16 of the barrel 12 

compression molding, the present invention offers a more and is advanced therein towards the opposing second end 18. 

cost effective method of producing a higher quality part. 6 5 The apparatus 10 further has a die 22 that is in selective fluid 

In accordance with the preferred embodiment of the communication with the internal cavity 14 at the second end 

present invention, there is provided a method to incorporate 18 of the barrel 12. The blade 23 is designed to provide the 
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selective fluid communication between the internal cavity 14 
and the die 22 so that in the closed position shown in FIGS. 
1 and 2, the screw 24 is prevented from advancing material 
into the die 22. As a result, material disposed within the 
internal cavity 14 of the barrel 12 is prevented from freely 5 
entering or communicating with the die 22. As is known in 
the extruding art, the screw 24 has a plurality of flights 25 
which are designed to advance the material through the 
internal cavity 14 as the screw 24 rotates and the material is 
picked-up and advanced forward by the plurality of flights 10 
25. 

Apparatus 10 also includes a first inlet 32 which is in 
communication with the barrel 12 and more specifically with 
the internal cavity 14. The exemplary first inlet 32 comprises 
a bore extending from the outer surface 20 of the barrel 12 i$ 
and is generally perpendicular to the longitudinal axis of the 
barrel 12. The first inlet 32 opens into the internal cavity 14 
so that an inlet material 34 may be introduced thereto from 
outside of the apparatus 10. In the exemplary illustrated 
embodiment, the first inlet 32 is preferably cylindrical in 2 o 
shape. It being understood that the first inlet 32 may have 
other cross-sectional shapes. The diameter of the first inlet 
32 is of a sufficient dimension to permit the inlet material 34 
to be introduced therethrough into the internal cavity 14. As 
the inlet material 34 enters the internal cavity 14, the screw 2 s 
24 is designed to have the greatest flight depth so as to assure 
easy entry of the inlet material 34 and its conveyance 
forward under a high positive pressure. As the inlet material 

34 is introduced into the internal cavity 14, it contacts the 
screw 24 and is disposed therearound and between the 30 
plurality of nights 25 which serves to advance the inlet 
material 34 once the screw 24 is rotated. 

The apparatus 10 further includes a second inlet 36. 
Similar to the first inlet 32, the second inlet 36 comprises a 
bore extending from the outer surface 20 of the barrel 12. 35 
The second inlet 36 is generally perpendicular to the longi- 
tudinal axis of the barrel 12. The second inlet 36 opens into 
the internal cavity 14 so that material may be introduced 
thereto from outside of the apparatus 10. In the exemplary 
illustrated embodiment, the second inlet 36 is preferably 40 
cylindrical in shape. The second inlet 36 is positioned 
intermediate the first inlet 34 and the second end 18 of the 
barrel 12 and each respective axis of the first and second 
inlets 34, 36 are substantially parallel to one another. In the 
preferred embodiment, the second inlet 36 is cylindrical in 45 
shape; however, as can be appreciated, other shapes can be 
utilized. The first inlet 32 is thus closer to the first end 16 of 
the barrel 12 than the second inlet 36. 

The first inlet 32 is designed to fit a hopper 35. The hopper 

35 comprises a funnel-like holder capable of holding the 50 
inlet material 34. The inlet material 34 being fed into the 
internal cavity 14 through the first inlet 32 includes but is not 
limited to suitable thermoplastics and thermoset compounds. 

In one exemplary embodiment, the inlet material 34 com- 
prises a quantity of plastic pellets which are fed through the 55 
hopper 35 and first inlet 32 into the internal cavity 14. The 
inlet material 34 is melted prior to further processing, 
wherein the melting is accomplished by maintaining a 
predetermined compression ratio of the screw 24 as the inlet 
material 34, e.g., plastic pellets, is advanced forward within 60 
the internal cavity 14 by the plurality of flights 25. As is 
known in the art, another method of describing the screw 24 
is in terms of compression ratio. The compression ratio is 
generally defined as a comparison of the channel depth in the 
first flight of the first zone (feeding zone) and the channel 65 
depth of the last flight in the first zone (feeding zone). The 
channel depth (flight depth) is the distance from the outer 



edge of a flight 25 to the outer surface of the screw 24. The 
screw 24 is designed so that as the inlet material 34 enters 
the internal cavity 14 through the first inlet 32, the depth of 
the plurality of flights 25 of the screw 24 is decreased. By 
decreasing the depth of the plurality of flights 25, the 
compression ratio is increased. Between the first and second 
inlets 32, 36 the depth of flights 25 transition quickly form 
a deep flight depth to a shallow flight depth. The flight depth 
should be deep enough to create a compression ratio greater 
than about 3.5:1 in the first zone. In an exemplary 
embodiment, the compression ratio is preferably about 8:1 
to aid in the rapid melting of the incoming inlet material 34 
before it reaches the next second inlet 36. 

The second inlet 36 preferably includes a reinforcing fiber 
guide 42. The guide 42 comprises any number of guides 
which are designed to separate individual reinforcing fiber 
bundles 44 from one another so that the individual reinforc- 
ing fiber bundles 44 do not become entangled with one 
another as they are fed into the internal cavity 14. It being 
understood that the reinforcing fibers are commonly pro- 
vided in reinforcing fiber bundles 44 which are then fed into 
the apparatus 10. An individual reinforcing fiber bundle 44 
is also commonly refered to as a roving which comprises of 
a number of fibers with defined diameters and special sizing. 
For example, the second inlet 36 may be formed so that the 
guide 42 comprises at least one bore formed in and extend- 
ing through the second inlet 36, wherein one reinforcing 
fiber bundle 44 is received within one bore. Each reinforcing 
fiber bundle 44 is formed of any suitable number of rein- 
forcing fibers including but not limited to glass fibers, 
natural fibers, polyaramid fibers (e.g., Kevlar fibers com- 
mercially available from DuPont), carbon fibers or the like. 
Each reinforcing fiber bundle 44 is fed into the guide 42 
from at least one winding/unwinding reel 46. The preferred 
embodiment is shown with three (3) winding/unwinding 
reels 46 for three (3) reinforcing fiber bundles 44; however 
as few as one (1) and as many as desired can be utilized in 
the present invention. The guide 42 is useful in directing the 
reinforcing fiber bundles 44 into the proper location in the 
internal cavity 14 and works in conjunction with the 
winding/unwinding reels 46 to keep the reinforcing fiber 
bundle 44 in a constant taut state. In the exemplary embodi- 
ment shown, the guide 42 comprises a rotatable member 
having a plurality of grooves formed therein for separating 
individual reinforcing fiber bundles 44 from one another so 
that the individual fiber bundles 44 do not become entangled 
during the feeding process. 

The winding/unwinding reels 46 are preferably located 
above the guide 42 and are fed from an equal number of 
spools 48 containing the reinforcing fiber bundles 44. While 
the exemplary embodiment shows the spools 48 as having a 
round shape, it is understood that the other shapes may be 
used. There is one spool 48 feeding each winding/unwinding 
reel 46. The movement of the apparatus 10 and the screw 24 
sets in motion the winding/unwinding of the reinforcing 
fiber bundles 44 on the reels 46. When the reinforcing fiber 
bundles 44 are unwound from the reels 46, the rotation of the 
screw 24 results in the feeding of the reinforcing fiber 
bundles 44 into the guide 42 and thus into the internal cavity 
14 of the barrel 12 as well as the simultaneous plastication 
of the inlet material 34 which is introduced through the first 
inlet 32. The unwinding of the reinforcing fiber bundles 44 
from the winding/unwinding reels 46 results in the accom- 
panying unwinding from the associated spools 48. When the 
process is reversed, the reels 46 are wound as are the 
connected spools 48 so as not to allow for any slack in the 
reinforcing fiber bundles 44. Thus, the reinforcing fiber 



US 6,4. 

5 

bundles 44 are consistently under tension regardless of the 
positioning of the apparatus 10. 

The internal cavity 14 of the barrel 12 is precisely sized 
to fit the screw 24 and allow a very narrow gap 50 to exist 
between the outer diameter of the screw 24 and the diameter 
of the internal cavity 14 of the barrel 12. The screw 24 has 
a preselected diameter (D) and length (L) such that L/D is 
large up to 35:1. Preferably the L/D ratio could be in the 
range of 20:1 to 35:1. It is generally known that the higher 
the L/D ratio, the higher will be the surface available for 
shearing, mixing, and plasticating the inlet material 34. 
Throughout operation the screw 24 will be free to rotate 
through the internal cavity 14, additionally the screw 24 is 
preset with back pressure allowing for the retraction (or 
movement away from the second end 18 of the barrel 12) of 
the screw 24 once an accumulation of material forms in front 
of the screw 24. The front of the screw 24 is defined as that 
area between the head 30 of the screw 24 and the blade 23 
when it is in a closed position. This is known as "shot size". 

The length of the barrel 12 is generally divided into at 
least three (3) zones, namely a first zone 52, a second zone 
56, and a third zone 62. Each of the zones 52, 56, 62 
performs an operation useful in the compounding/extrusion 
process as will be described in greater detail hereinafter. The 
first zone or a melting zone 52 is created as the inlet material 
34 is fed into the first inlet 32 under compression to form a 
melted plastic. The screw 24 design in this area has a deep 
flight depth to create a high compression ratio so that the 
inlet material 34 rapidly melts as the inlet material 34 is 
introduced and advanced through the first zone 52. However, 
the flight depth in the first zone 52 reduces sharply from the 
beginning to the end of the zone. As the melted inlet material 
34 leaves the first zone 52, it enters a second zone or a 
preparation and cutting zone 56. Thus, the first zone 52 
serves to melt the inlet material 34 for further processing in 
the apparatus 10. It being understood that the relative size of 
each of the zones 52, 56, and 62 has been illustrated for 
purpose of illustration and clarity only and it is within the 
scope of the present invention that the lengths of these zones 
52, 56, and 62 differ depending upon the application as is 
known in the art. 

The second zone 56 is established to prepare the rein- 
forcing fiber bundles 44 for shearing and impregnation as 
they are introduced into the apparatus 10 and more specifi- 
cally the internal cavity 14. The screw 24 design in this 
second zone 56 has a deep flight depth however the flight 
depth throughout the second zone 56 remains constant so 
that there is zero compression in the second zone 56. In other 
words, the compression ratio is zero because there is no 
change in the depth of the flights 25 in the second zone 54. 
The actual screw flight depth depends on the number of 
reinforcing fiber bundles 44 fed into the second zone 56 and 
the type of the inlet material 34 used. A deep flight depth is 
necessary in the second zone 56 so as to accommodate a 
larger volume of reinforcing fiber bundles 44. The flight 
depth should be as high as possible based on the structural 
integrity of the screw 24 which is dependant on the screw 
diameter. As the reinforcing fiber bundles 44 pass from the 
guide 42 into the internal cavity 14, the filaments of each of 
the reinforcing fiber bundles 44 are opened for proper melt 
impregnation. That is the filaments of the reinforcing fiber 
bundles 44 are opened for better wetting so that each 
filament can be coated with the melted inlet material 34. It 
is also understood that the may pre-heat the reinforcing fiber 
bundles 44 after the reinforcing fiber bundles 44 pass the 
winding/unwinding reels 46 but prior to entrance into the 
second inlet 36. This results in increased wettability of the 
individual reinforcing fiber bundles 44 with the melted inlet 
material 34. 
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In the second zone 56, the reinforcing fiber bundles 44 are 
sheared or broken to a desirable longer length. The shearing 
is accomplished as the tensile load on the reinforcing fiber 
bundles 44 is increased so that each of the reinforcing fiber 

5 bundles 44 shears in approximately the same length. As the 
reinforcing fiber bundles 44 move through the second zone 
56, the resistance on the reinforcing fiber bundles 44 
increases so that when the resistance becomes too great, the 
reinforcing fiber bundles 44 are sheared or broken forming 

10 individual sheared reinforcing fibers 60. 

Upon exiting the second zone 56, the sheared reinforcing 
fibers 60 exit along with the melted inlet material 34 and the 
further mixing begins. The third zone 62 comprises a mixing 
and impregnation zone for further mixing of the melted inlet 

15 material 34 and the sheared reinforcing fibers 60. The 
continued mixing and impregnation result in a fiber filled 
melt being produced. The fiber filled melt is generally 
indicated at 64 in the Figures. 
The apparatus 10 further includes an outlet 66 formed in 

20 the die 22 which serves as an exit for the fiber filled melt 64 
from the die 22 after the fiber filled melt 64 travels from the 
barrel 12 to the die 22. The outlet 66 preferably extends from 
the die 22 in a direction perpendicular to the longitudinal 
axis of the die 22 and continues through the die 22 before 

25 reaching an outer surface of the die 22. In the preferred 
embodiment, the outlet 66 is cylindrical in shape however 
other shapes can be utilized, e.g., such as ribbons or sheet 
shapes. The outlet 66 is located near the second end 18 of the 
barrel 12 and at least a portion thereof is preferably generally 

30 parallel to the first and second inlets 32, 36, however the 
outlet 66 extends in a direction opposite the first and second 
inlets 32, 36. 

As shown in FIGS. 1-5, a method of using the in-line 
compounding/extrusion deposition compression molding 

35 apparatus 10 will now be described in greater detail. The 
present invention provides a process for preparing the mold- 
able fiber filled melt 64. The fiber filled melt 64 is produced 
from the mixing of the melted inlet material 34 and the 
sheared reinforcing fibers 60. Molded structures that are 

40 reinforced with long reinforcing fibers have mechanical 
properties superior to those structures reinforced with short 
fibers. To enjoy the benefits of superior mechanical 
properties, the process of this invention allows for a long 
reinforcing fiber to be maintained without breakage and 

45 therefore cut at a longer length than was previously possible 
in a single step process. In the process, the apparatus 10 of 
the present invention mixes the longer cut reinforcing fiber 
bundles 44 with the melted inlet material 34 before depo- 
sition in a tool 70, As shown in FIG. 1, the apparatus 10 and 

50 more specifically the barrel 12 and the die 22 are in a first 
position relative to the winding/unwinding reels 46 and the 
tool 70. In the first position, the die 22 is not axially aligned 
with the tool 70 but rather the barrel 12 and the die 22 are 
off set therefrom. 

55 In the exemplary and illustrated embodiment in the first 
position, the second inlet 36 is generally axially aligned with 
a centermost winding/unwinding reel 46. In the first 
position, the screw 24 generally is disposed within the barrel 
12 so that the head 30 is adjacent to or abuts the comple- 

60 mentary beveled portion 19 of the barrel 12. The beveled 
portion 19 comprises an annular beveled surface which has 
a complementary shape as the head 30 so that the beveled 
portion 19 acts as a stop for the screw 24. It being under- 
stood that the illustrated embodiment shown in the Figures 

65 is merely exemplary in nature and the present invention is 
not limited to the illustrated embodiment. In the first 
position, the apparatus 10 is stationary and the reinforcing 
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fiber bundles 44 are disposed within the winding/unwinding 
reel 46 and extend through the second inlet 36 into the 
internal cavity 14 so that the reinforcing fiber bundles 44 
contact the screw 24 before operation of the apparatus 10. 

As shown in FIG. 2, the process begins by rotating the 
screw 24 located inside the internal cavity 14 of the barrel 
12. Preferably, a beginning rotation speed would be depen- 
dant on the type of inlet material 34 and the amount of the 
cut reinforcing fibers 60 in the final compound. Not shown 
is the means by which the rotation of screw 24 is accom- 
plished. Any conventional means for rotation can be utilized. 
Simultaneously, the inlet material 34 is fed under compres- 
sion into the internal cavity 14 of the barrel 12. The melting 
of the inlet material 34 is achieved in the first zone 52 of the 
apparatus 10. The first zone 52 is where the first operation 
of the apparatus is preformed. Preferably the inlet material 
34 is fed under a high compression ratio, up to about 8:1 for 
the rapid melting of the inlet material 34. The flight depth in 
the first zone 52 rapidly transitions from a deep flight depth 
to a shallow flight depth within the first zone 52. 

As the inlet material 34 is melted, the screw 24 rotation 
continuously feeds the reinforcing fiber bundles 44 into the 
second zone 56. The reinforcing fiber bundles 44 are fed into 
the second zone 56 through the guide 42 from the series of 
winding/unwinding reels 46 which work in connection with 
the movement of the apparatus 10 to keep the reinforcing 
fiber bundles 44 in a constant taut state. The screw 24 
preferably has a free flowing check valve to prevent unin- 
tentional reinforcing fiber bundles 44 breakage. The rein- 
forcing fiber bundles 44 are thus introduced into the appa- 
ratus 10 after the inlet material 34 has been melted in the first 
zone 52 due to the rotation of the screw 24. Thus, the 
plastication process begins prior to the introduction of the 
reinforcing fiber bundles 44 into the internal cavity 14 so 
that melted inlet material 34 is advanced into the second 
zone 56 for combination with the reinforcing fiber bundles 
44. Rotation of the winding/unwinding reels 46 is controlled 
based on the apparatus 10 location during operation. The 
movement of the winding/unwinding reels 46 can be gen- 
erated by a servo-driven motor with closed loop control or 
by pretension created by spring loading the reel 46 or any 
other known mechanical means. When closed loop control is 
chosen as the mechanism, the servo-driven motor either 
unwinds or winds the reinforcing fiber bundles 44 depending 
upon the relative position of the apparatus 10. Prior to 
moving into the second zone 56, the reinforcing fiber 
bundles 44 can be preheated to a temperature at or above the 
plastic melt temperature of the inlet material 34 to improve 
melt mixing and homogenization that takes place in the third 
zone 62. A higher melt temperature allows for better wetting 
of the reinforcing fiber bundles 44 by the melted inlet 
material 34. 

The rotation of the screw 24 continuously feeds the 
reinforcing fiber bundles 44 into the internal cavity 14 at the 
second zone 56 and as the reinforcing fiber bundles 44 enter 
the second zone 56, the reinforcing fiber bundles 44 are 
unwound from the winding/unwinding reels 46 keeping the 
reinforcing fiber bundles 44 stretched. Once in the second 
zone 56, the filaments of the reinforcing fiber bundles 44 are 
opened for improved wetting and the stretched reinforcing 
fiber bundles 44 are sheared or broken to a desirable longer 
length. Shearing is achieved, for example, by increasing the 
tensile load on the reinforcing fiber bundles 44 and when the 
resistance becomes too great, the reinforcing fiber bundles 
44 shear forming the sheared reinforcing fibers 60. As the 
inlet material 34 is melted and the reinforcing fiber bundles 
44 sheared, the apparatus 10 is stationary, as best shown in 
FIG. 2. 
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As the reinforcing fiber bundles 44 are sheared in the 
second zone 56, the melted inlet material 34 begins to mix 
with the cut reinforcing fibers 60 to create a fiber filled melt 
64. Continued mixing occurs as the fiber filled melt 64 is 

5 advanced into the third zone 62 by the rotation of the 
plurality of flights 25. The third zone 62 accomplishes the 
uniform distributive mixing and impregnation of the sheared 
reinforcing fibers 60 with the melted inlet material 34 to 
form the fiber filled melt 64. As the fiber filled melt 64 

]0 accumulates, it is advanced into an accumulation zone, 
generally indicated at 68 because the plurality of flights 25 
continues to advance the fiber filled melt 64. The accumu- 
lation zone 68 is that area between the head 30 of the screw 
24 and the blade 23. As can be appreciated the accumulation 

15 zone 68 increases as the screw 24 retracts in a direction away 
from the blade 23. The screw 24 retracts based on its back 
pressure setting and the retraction occurs as more and more 
fiber filled melt 64 builds up in the accumulation zone 68. 
Accumulation of the fiber filled melt 64 occurs until enough 

20 is gathered to create a shot size having a predetermined size. 
Because the blade 23 is in a closed position, the fiber filled 
melt 64 continues to build between the head 30 and the blade 
23 and this build-up causes the back pressure which drives 
the screw 24 in a direction away from the blade 23. As 

25 shown in FIG. 2, the screw 24 assumes a second retracted 
position in which the screw 24 has been driven in a direction 
away from the blade 23 to accommodate the fiber filled melt 
64 between the head 30 and the blade 23. 
As shown in FIG. 3, once the proper shot size is detected, 

30 the apparatus 10 moves in a direction towards the tool 70. An 
exemplary tool 70 comprises a press 73 with a mold 74. As 
mentioned above the apparatus 10 is moved by known 
means. Because the winding/unwinding reels 46 and spools 
48 are preferably stationary relative to the barrel 12 and die 

35 22, the movement of the barrel 12 and die 22 in the direction 
towards the tool 70 causes the winding/unwinding reels 46 
to unwind to release an appropriate length of each of the 
reinforcing fiber bundles 44 to accommodate the movement 
of the barrel 12 and the die 22. This results because even in 

40 the stage shown in FIG. 3, the reinforcing fiber bundles 44 
are fed into the second inlet 36 and communicate with the 
internal cavity 14 and the screw 24 disposed therein so that 
ends of the reinforcing fiber bundles 44 are not free but 
rather are secured within the internal cavity 14 so that it is 

45 taut (under tension). Thus, the movement of the barrel 12 
and the die 22 towards the tool 70 results in the reinforcing 
fiber bundles 44 being angled relative to the barrel 12 so that 
the reinforcing fiber bundles 44 feed through the guide 42 
and into the second inlet 36 as shown in FIG. 3. 

50 Once the apparatus 10 reaches an edge 71 of the tool 70, 
the blade 23 in the die 22 opens and the screw 24 moves 
forward forcing the fiber filled melt 64 out through the outlet 
66 and deposits the fiber filled melt 64 as the apparatus 10 
moves as programmed to distribute the fiber filled melt 64 

55 over the mold 74. Because the apparatus 10 is capable of 
moving in three dimensions X, Y, and Z, the apparatus 10 is 
capable of distributing the fiber filled melt 64 by moving in 
the programmed X, Y, and Z directions to evenly distribute 
the fiber filled melt 64 in the mold 74. Because movement 

60 of the barrel 12 or screw 24 effects the existing tension of the 
reinforcing fiber bundles 44, the winding/unwinding reels 46 
are designed to either wind or unwind the reinforcing fiber 
bundles 44 so that sagging and breakage of the reinforcing 
fiber bundles 44 are prevented. As shown in FIG. 3, the 

65 winding/unwinding reels 46 unwind the reinforcing fiber 
bundles 44 with constant tension to permit movement of the 
barrel 12 and die 22 towards the tool 70. In this deposition 
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stage, the screw 24 is driven towards the beveled surface 19 
to once again assume the first position and cause the fiber 
filled melt 64 to be displaced through the central opening 31 
and into the outlet 66 of the die 22 where the fiber filled melt 
64 is then directed into the mold 74. 

Referring now to FIGS. 4 and 5. FIG. 4 illustrates the 
apparatus 10 as the fiber filled melt 64 is deposited onto the 
mold 74. Once the apparatus 10 is properly positioned, the 
blade 23 is opened so that the fiber filled melt 64 may be 
deposited onto the mold 74. After the fiber filled melt 64 is 
deposited onto the mold 74, the press 73 is driven so as to 
close the tool 70 and compress, thereby forming the desired 
part by a compression molding technique. As shown in FIG. 
5, after the fiber filled melt 64 is deposited and prior to the 
press 73 being driven towards the mold 74, the barrel 12 and 
the die 22 move back to the first position illustrated gener- 
ally in FIG. 1. At this time, the blade 23 is repositioned to 
the closed position so that the process may be repeated. 
When the barrel 12 and the die 22 move back the first 
position, the reinforcing fiber bundles 44 are wound up by 
the winding/unwinding reels 46 so as to take up the potential 
slack which would be created by returning the barrel 12 and 
die 22 to the original first position. Because the winding/ 
unwinding reels 46 are preferably spring loaded, the rein- 
forcing fiber bundles 44 are not permitted to sag but rather 
remain under constant tension as the barrel 12 and the die 22 
move either in the direction towards the tool 70 or in the 
direction away from the tool 70. This movement of the barrel 
12 and the die 22 also likewise permits the press 73 to be 
driven towards and contact the mold 74 to produce the 
compressed formed part. Once the formed part cools, the 
tool 70 is opened and the molded fiber filled part is removed. 

The present invention advantageously provides apparatus 
10 and process for in-line compounding of reinforcing fiber 
bundles 44 and molding 34 in a single step process. The 
exemplary apparatus 10 compounds a reinforcing product by 
use of the reciprocating single screw 24 having a reinforcing 
fiber compounder and melt depositing unit. According to the 
present invention, the reinforcing fiber bundles 44 are sev- 
ered at a maximum desirable length and are maintained in a 
stretched tensioned condition regardless of the positioning 
of apparatus 10 such that the reinforcing fiber bundles 44 are 
not loose and does not sag during the process. By using a 
single-step process for the in-line compounding of reinforc- 
ing fibers and extrusion compression molding, the present 
invention offers a more cost effective method of producing 
a higher quality part because a single apparatus is used 
instead of the multiple part assemblies used conventionally. 
Additionally, the present invention incorporates the reinforc- 
ing fiber bundles 44 into the reciprocating single screw 
injection unit and severs and uniformly impregnates the fine 
filaments with resin keeping maximum fiber length in the 
manufactured product. Due to the longer reinforcing fiber 
retention in the manufactured product, a higher strength 
product can be produced. 

It will be understood that a person skilled in the art may 
make modifications to the preferred embodiment shown 
herein within the scope and intent of the claims. While the 
present invention has been described as carried out in a 
specific embodiment thereof, it is not intended to be limited 
thereby but is intended to cover the invention broadly within 
the scope and spirit of the claims. 

What is claimed is: 

1. A process for in-line compounding of a reinforcing fiber 
bundle with extrusion compression molding using an in-line 
compounding/extrusion deposition and molding apparatus, 
the process comprising: 
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providing a first material to a first inlet formed in the 
apparatus, the first inlet being in fluid communication 
with ail internal cavity of the apparatus, the internal 
cavity having a single rotatable extruder screw, the first 
5 material being fed into a first zone of the internal 
cavity; 

providing at least one reinforcing fiber bundle to a second 
inlet formed in the apparatus, the at least one reinforc- 
]0 ing fiber bundle being fed into a second zone of the 
internal cavity under a predetermined constant tension 
to prevent slack in the at least one reinforcing fiber 
bundle; 

rotating the single rotatable extruder screw so as to melt 
15 the first material in the first zone prior to advancing the 
melted first material to the second zone, the rotation of 
the single rotatable extruder screw causing the at least 
one reinforcing fiber bundle to be fed into the second 
zone under the predetermined constant tension; 
20 shearing the at least one reinforcing fiber bundle in the 
second zone; 

mixing the sheared at least one reinforcing fiber bundle 
and the melted first material in a third zone to produce 
a fiber bundle filled melt; 
25 retracting the single rotatable extruder screw to permit 
expansion of a fourth zone, wherein the fiber filled melt 
accumulates in the fourth zone thereby forming a shot; 
and 

30 directing the fiber bundle filled melt through an outlet 
formed in the apparatus by movement of the single 
rotatable extruder screw in a direction toward the 
outlet. 

2. The process according to claim 1, wherein the first 
35 material comprises a material selected from the group con- 
sisting of thermoplastic materials and thermoset materials. 

3. The process according to claim 1, wherein providing at 
least one reinforcing fiber bundle to the second inlet com- 
prises: 

40 unwinding the at least one reinforcing fiber bundle from 
a spool; 

and maintaining the predetermined tension by passing the 
at least one reinforcing fiber bundle over a winding/ 
unwinding reel so that the at least one reinforcing fiber 
45 bundle is under constant tension as the at least one 
reinforcing fiber bundle is fed into the second zone. 

4. The process according to claim 3, wherein the winding/ 
unwinding reel is spring loaded so that slack in the at least 
one reinforcing fiber bundle is prevented during operation of 

50 the apparatus and the tensile load on the at least one 
reinforcing fiber bundle is maintained below the predeter- 
mined value. 

5. The process according to claim 1, wherein shearing the 
at least one reinforcing fiber bundle comprises placing a 

55 tensile load on the at least reinforcing fiber bundle so that the 
tensile load exceeds a predetermined value and causes the at 
least one reinforcing fiber bundle to shear. 

6. The process according to claim 1, wherein the single 
rotatable extruder screw has a first compression ratio in the 

60 first zone to cause the inlet material to rapidly melt in the 
first zone prior to the melted first material being advanced 
into the second zone. 

7. The process according to claim 6, wherein the first 
compression ratio is from about 3.5:1 to about 8:1. 

65 8. The process according to claim 1, wherein the second 
zone includes a second compression ratio, wherein the 
second compression ratio is constant in the second zone. 
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9. The process according to claim 1, further including: 
moving the apparatus to a predetermined mold position 

prior to directing the fiber bundle filled melt from the 
outlet. 

10. The process according to claim 9, wherein moving the 5 
apparatus comprises moving the apparatus in at least one 
direction of a three dimensional area. 

11. A process for in-line manufacturing of a fiber rein- 
forced molded structure, the process comprising: 

feeding a first material into a first inlet of an apparatus, 10 
said first inlet being in fluid communication with an 
internal cavity of said apparatus; 

advancing said first material from said first inlet to a 
second inlet of said apparatus by rotating a single J5 
extruder screw within said internal cavity such that said 
first material is melted, said second inlet being in fluid 
communication with said internal cavity; 

feeding reinforcing fiber under a predetermined tension 
into said second inlet by rotating said single extruder 2 o 
screw within said internal cavity such that a fiber filled 
melt is formed from said first material and said rein- 
forcing fiber; 

advancing said fiber filled melt from said second inlet to 
an outlet of said internal cavity by rotating said single 25 
extruder screw within said internal cavity; 

accumulating a shot of said fiber filled melt at said outlet 
by moving said single extruder screw from a first 
position to a second position; 

distributing s aid shot on a mold by opening s aid outlet, 30 
moving said single extruder screw from said second 
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position to said first position, and moving said outlet 
with respect to said mold; and 
closing said mold to form the fiber reinforced molded 
structure from said shot. 

12. The process as in claim 11, wherein said inlet material 
comprises material selected from the group consisting of 
thermoplastic materials and thermoset materials. 

13. The process as in claim 12, wherein said reinforcing 
is selected from the group consisting of glass fibers, natural 
fibers, polyaramid fibers, and carbon fibers. 

14. The process as in claim 11, wherein moving said outlet 
with respect to said mold further comprises: 

moving said apparatus in at least one direction of a three 
dimensional area. 

15. The process as in claim 11, wherein feeding said 
reinforcing fiber under said predetermined tension com- 
prises: 

unwinding said reinforcing fiber from a spool; and 
maintaining said predetermined tension by passing said 
reinforcing fiber over a winding/unwinding reel so that 
said reinforcing fiber is under said predetermined ten- 
sion as said reinforcing fiber is fed into said second 
inlet. 

16. The process as in claim 15, wherein said predeter- 
mined tension prevents sagging and premature shearing of 
said reinforcing fiber, 

* * * * * 
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HUMAN UNCOUPLING PROTEINS AND 
POLYNUCLEOTIDES ENCODING THE 
SAME 

The present application claims priority to U.S. Applica- 
tion Ser. No. 60/119,228, filed Feb. 9, 1999, and U.S. 
Application Ser. No. 60/158,458, filed Oct. 8, 1999, which 
are herein incorporated by reference in their entirety. 

1. INTRODUCTION 

The present invention relates to the discovery, 
identification, and characterization of novel human poly- 
nucleotide sequences and the novel polypeptides encoded 
thereby. The invention encompasses the described 
polynucleotides, host cell expression systems, the encoded 
proteins or polypeptides, and fusion proteins and peptides 
derived therefrom, antibodies to the encoded proteins or 
peptides, and genetically engineered animals that lack func- 
tional forms of the genes encoding the disclosed sequences, 
over express the disclosed sequences, as well as antagonists 
and agonists of the proteins, along with other compounds 
that modulate the expression or activity of the proteins 
encoded by the disclosed sequences that can be used for 
diagnosis, drug screening, clinical trial monitoring, the 
treatment of physiological or behavioral disorders, or oth- 
erwise improving the quality of life. 

2. BACKGROUND OF THE INVENTION 

Uncoupling proteins (UCPs) are found in the 
mitochondria, but are encoded within the nucleus. In the 
mitochondria, UCPs uncouple, or regulate, the gradient that 
drives energy production in the cell/body. As such, UCPs 
effectively modulate the efficiency of energy production in 
the body, and hence body metabolism. Given the role of 
UCPs in the body, they are thought to be important targets 
for the study of thermogenesis, obesity, cachexia, and other 
metabolically related physiological functions, diseases, and 
disorders. 

3. SUMMARY OF THE INVENTION 

The present invention relates to the discovery, 
identification, and characterization of nucleotides that 
encode novel human UCPs, and the corresponding amino 
acid sequences encoded by the disclosed sequences. The 
novel human uncoupling proteins (NUCPs) described for the 
first time herein share structural relatedness with other 
mammalian uncoupling proteins and brain mitochondrial 
carrier proteins. The novel human nucleic acid sequences 
described herein encode proteins of 291 and 293 amino 
acids in length (see SEQ ID NOS:2 and 4). 

A murine homologue of the described NUCPs has been 
identified and a "knockout" ES cell line has been produced 
using the methods described in U.S. application Ser. No. 
08/942,806, herein incorporated by reference. Alternatively, 
such knockout cells and animals can be produced using 
conventional methods for generating genetically engineered 
animals and cells(see, for example, PCT Applic. No. PCT/ 
US98/03243, filed Feb. 20, 1998, herein incorporated by 
reference). Accordingly, an additional aspect of the present 
invention includes knockout cells and animals having 
genetically engineered mutations in gene encoding the pres- 
ently described NUCPs. 

The invention encompasses the nucleotides presented in 
the Sequence Listing, host cells expressing such nucleotides, 
and the expression products of such nucleotides, and: (a) 
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nucleotides that encode mammalian homologs of the 
described genes, including the specifically described 
NUCPs, and the NUCP products; (b) nucleotides that encode 
one or more portions of the NUCPs that correspond to 

5 functional domains, and the polypeptide products specified 
by such nucleotide sequences, including but not limited to 
the novel regions of any active domain(s); (c) isolated 
nucleotides that encode mutant versions, engineered or 
naturally occurring, of the described NUCPs in which all or 

10 a part of at least one domain is deleted or altered, and the 
polypeptide products specified by such nucleotide 
sequences, including but not limited to soluble proteins and 
peptides in which all or a portion of the signal sequence in 
deleted; (d) nucleotides that encode chimeric fusion proteins 

is containing all or a portion of a coding region of NUCP, or 
one of its domains (e.g., a transmembrane domain, accessory 
protein/self- association domain, etc.) fused to another pep- 
tide or polypeptide. 

The invention also encompasses agonists and antagonists 

20 of NUCPs, including small molecules, large molecules, 
mutant NUCPs, or portions thereof that compete with or 
bind to native NUCPs, antibodies, and nucleotide sequences 
that can be used to inhibit the expression of the described 
NUCPs (e.g., antisense, ribozyme molecules, and gene or 

25 regulatory sequence replacement constructs) or to enhance 
the expression of the described NUCPs (e.g., expression 
constructs that place the described genes under the control of 
a strong promoter system), as well as transgenic animals that 
express a NUCP transgene, or "knockouts" (which can be 

30 conditional) that do not express functional NUCP. 

Further, the present invention also relates to methods for 
using of the described NUCP products for the identification 
of compounds that modulate, i.e., act as agonists or 
antagonists, of NUCP expression and/or NUCP product 

35 activity. Such compounds can be used as therapeutic agents 
for the treatment of any of a wide variety of symptomatic 
representations of biological disorders or imbalances. 

An additional embodiment of the present invention 
includes therapy and treatments mediated by NUCP gene 
delivery. Gene delivery can be to somatic or stem cells, and 
may be effected using viral (i.e., retrovirus, adeno-associated 
virus, etc.) or non- viral (i.e., cationic lipids, formulations 
using "naked" DNA, etc.) methods. 

45 4. DESCRIPTION OF THE SEQUENCE LISTING 
AND FIGURES 

The Sequence Listing provides the sequences of the 
NUCP polynucleotides, and the amino acid sequences 
encoded thereby. 

50 5. DETAILED DESCRIPTION OF THE 

INVENTION 

The NUCPs described for the first time herein are novel 
proteins that are expressed, inter alia, in gene trapped human 

55 cells, human lymph node or kidney cells, and/or ES cells. 
The NUCPs exert biological effect by regulating the effi- 
ciency of energy generation in the body with the result being 
that excess resources are converted to heat or are otherwise 
stored as fat, etc. Regulating the function of a NUCP product 

60 will effect NUCP-mediated processes with resulting effects 
on fat production and usage, superoxide generation and 
regulation, and all biological properties and functions that 
are tied to fatty acid metabolism. Because of these important 
roles, UCPs have been the focus of intense scientific scrutiny 

65 (see PCT Application No. PCT/EP98/02645, U.S. Pat. Nos. 
5,853,975, 5,741,666 and 5,702,902 all of which are herein 
incorporated by reference in their entirety). 
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The present invention encompasses the use of the 
described NUCP nucleotides, NUCPs and NUCP peptides 
therefrom, as well as antibodies, preferably humanized 
monoclonal antibodies, or binding fragments, domains, or 
fusion proteins thereof, or antiidiotypic variants derived 
therefrom, that bind NUCP (which can, for example, also act 
as NUCP agonists or antagonists), other antagonists that 
inhibit binding activity or expression, or agonists that acti- 
vate NUCP receptor activity or increase NUCP expression, 
in the diagnosis and/or treatment of disease. 

In particular, the invention described in the subsections 
below encompasses NUCP polypeptides or peptides corre- 
sponding to functional domains of NUCPs, mutated, trun- 
cated or deleted NUCPs (e.g., NUCPs missing one or more 
functional domains or portions thereof), NUCP fusion pro- 
teins (e.g., where NUCP or a functional domain of NUCP is 
fused to an unrelated protein or peptide such as an immu- 
noglobulin constant region, i.e., IgFc), nucleotide sequences 
encoding such products, and host cell expression systems 
that can produce such NUCP products. 

The invention also encompasses antibodies and anti- 
idiotypic antibodies (including Fab fragments), antagonists 
and agonists of the NUCP, as well as compounds or nucle- 
otide constructs that inhibit expression of a NUCP gene 
(transcription factor inhibitors, antisense and ribozyme 
molecules, or gene or regulatory sequence replacement 
constructs), or promote expression of a NUCP (e.g., expres- 
sion constructs in which a NUCP coding sequence is opera- 
tively associated with expression control elements such as 
promoters, promoter/enhancers, etc.). The invention also 
relates to host cells and animals genetically engineered to 
express a NUCP (or mutant variants thereof) or to inhibit or 
"knockout" expression of an animal homolog of a NUCP 
gene. 

The NUCPs, NUCP peptides, and NUCP fusion proteins 
derived therefrom, NUCP nucleotide sequences, antibodies, 
antagonists and agonists can be useful for the detection of 
mutant NUCPs or inappropriately expressed NUCPs for the 
diagnosis of biological disorders (high blood pressure, 
obesity, etc.) and disease. The NUCP products or peptides, 
NUCP fusion proteins, NUCP nucleotide sequences, host 
cell expression systems, antibodies, antagonists, agonists 
and genetically engineered cells and animals can also be 
used for screening for drugs (or high throughput screening 
of combinatorial libraries) effective in the treatment of the 
symptomatic or phenotypic manifestations of perturbing the 
normal function of NUCP in the body. The use of engineered 
host cells and/or animals may offer an advantage in that such 
systems allow not only for the identification of compounds 
that bind to an endogenous NUCP, but can also identify 
compounds that facilitate or inhibit NUCP-mediated uncou- 
pling. 

Of particular interest are genetically engineered nucle- 
otide constructs, or expression vectors, that encode NUCP 
products and derivatives (NUCP peptides, fusions, etc). 
Nucleotide constructs encoding such NUCP products and 
derivatives can be used to genetically engineer host cells to 
express such products in vivo; these genetically engineered 
cells function as "bioreactors" in the body delivering a 
continuous supply of a NUCP product, NUCP peptide, or 
NUCP fusion protein to the body. Nucleotide constructs 
encoding functional NUCPs, mutant NUCPs, as well as 
antisense and ribozyme molecules can also be used in "gene 
therapy" approaches for the modulation of NUCP expres- 
sion. Thus, the invention also encompasses pharmaceutical 
formulations and methods for treating biological disorders. 

Therapeutic gene delivery of the described NUCP nucle- 
otides can be effected by a variety of methods. For example, 
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methods of retroviral human gene therapy are described in, 
inter alia, U.S. Pat. Nos. 5,399,346 and 5,858,740; adenovi- 
ral vectors for gene therapy/delivery are described in U.S. 
Pat. No. 5,824,544; adeno-associated viral vectors are 

5 described in U.S. Pat. Nos. 5,843,742, 5,780,280, and 5,846, 
528; herpes virus vectors are described in U.S. Pat. No. 
5,830,727, and other vectors and methods of nonvirally 
(e.g., polynucleotides that are not encapsulated by viral 
capsid protein, "naked" DNA, or DNA formulated in lipid or 

10 chemical complexes) introducing foreign genetic material of 
recombinant origin into a host mammalian, and preferably 
human, cell are described in U.S. Pat. Nos. 5,827,703 and 
5,840,710 all of which are herein incorporated by reference 
in their entirety. When the above methods are applied to 

15 selectively express or inhibit the expression of a NUCP in 
tumor/diseased cells, the described methods and composi- 
tions can be used as chemotherapeutic agents for the treat- 
ment of cancer and other diseases and disorders. 

Various aspects of the invention are described in greater 

20 detail in the subsections below. 
5.1. The NUCP Polynucleotides 

The cDNA sequences (SEQ ID NOS:l and 3) and 
deduced amino acid sequences (SEQ ID NOS: 2 and 4) of the 

25 described NUCPs are presented in the Sequence Listing. The 
NUCP cDNA sequences were obtained from human lymph 
node, kidney, and fetal brain cDNA libraries (Edge 
Biosystems, Gaithersburg, Md.) using probes and/or primers 
generated from gene trapped sequence tags and a human 

30 homolog of the described NUCPs. RT-PCR analysis indi- 
cated that expression of the described NUCPs can be 
detected in, inter alia, human cerebellum, spinal cord, 
thymus, spleen, lymph node, bone marrow, trachea, lung, 
kidney, fetal liver, prostate, testis, thyroid, salivary gland, 

35 stomach, heart, uterus, and mammary gland, with particu- 
larly strong expression in kidney, adrenal gland, and skeletal 
muscle. The above expression studies were largely verified 
by Northern analysis that also detected particularly strong 
expression in human skeletal muscle, heart, adrenal gland, 

40 and kidney. 

The NUCPs of the present invention include: (a) the 
human DNA sequences presented in the Sequence Listing 
and additionally contemplates any nucleotide sequence 
encoding a contiguous and functional NUCP open reading 

45 frame (ORF) that hybridizes to a complement of the DNA 
sequence presented in the Sequence Listing under highly 
stringent conditions, e.g., hybridization to filter-bound DNA 
in 0.5 M NaHP0 4 , 7% sodium dodecyl sulfate (SDS), 1 mM 
EDTA at 65° C, and washing in 0.1xSSC/0.1% SDS at 68° 

50 C. (Ausubel F. M. et al., eds., 1989, Current Protocols in 
Molecular Biology, Vol. I, Green Publishing Associates, 
Inc., and John Wiley & sons, Inc., New York, at p. 2.10.3) 
and encodes a functionally equivalent gene product. Addi- 
tionally contemplated are any nucleotide sequences that 

55 hybridize to the complement of the DNA sequence that 
encode and express an amino acid sequence presented in the 
Sequence Listing under moderately stringent conditions, 
e.g., washing in 0.2xSSC/0.1% SDS at 42° C. (Ausubel et 
al., 1989, supra), yet still encode a functionally equivalent 

60 NUCP product. Functional equivalents of a NUCP include 
naturally occurring NUCPs present in other species, and 
mutant NUCPs whether naturally occurring or engineered. 
The invention also includes degenerate variants of the 
disclosed sequences. 

65 The invention also includes nucleic acid molecules, pref- 
erably DNA molecules, that hybridize to, and are therefore 
the complements of, the described NUCP nucleotide 
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sequences. Such hybridization conditions may be highly 
stringent or less highly stringent, as described above. In 
instances wherein the nucleic acid molecules are deoxyoli- 
gonucleotides ("DNA oligos"), such molecules are particu- 
larly about 16 to about 100 bases long, about 20 to about 80, 
or about 34 to about 45 bases long, or any variation or 
combination of sizes represented therein that incorporate a 
contiguous region of sequence first disclosed in the present 
Sequence Listing. Such oligonucleotides can be used in 
conjunction with the polymerase chain reaction (PCR) to 
screen libraries, isolate clones, and prepare cloning and 
sequencing templates, etc. Alternatively, the NUCP oligo- 
nucleotides can be used as hybridization probes for screen- 
ing libraries or assessing gene expression patterns 
(particularly using a micro array or high-throughput "chip" 
format). Chip applications can involve a series of the 
described NUCP oligonucleotide sequences, or the comple- 
ments thereof, can be used to represent all or a portion of the 
described NUCP sequences. The oligonucleotides, typically 
between about 16 to about 40 (or any whole number within 
the stated range) nucleotides in length may partially overlap 
each other and/or the NUCP sequence may be represented 
using oligonucleotides that do not overlap. Accordingly, the 
described NUCP polynucleotide sequences shall typically 
comprise at least about two or three distinct oligonucleotide 
sequences of at least about 18, and preferably about 25, 
nucleotides in length that are each first disclosed in the 
described Sequence Listing. Such oligonucleotide 
sequences may begin at any nucleotide present within a 
sequence in the Sequence Listing and proceed in either a 
sense (5'-to-3') orientation vis-a-vis the described sequence 
or in an antisense orientation. 

For oligonucleotide probes, highly stringent conditions 
may refer, e.g., to washing in 6xSSC/0.05% sodium pyro- 
phosphate at 37° C. (for 14-base oligos), 48° C. (for 17-base 
oligos), 55° C. (for 20-base oligos), and 60° C. (for 23-base 
oligos). These nucleic acid molecules may encode or act as 
NUCP gene antisense molecules, useful, for example, in 
NUCP gene regulation (for and/or as antisense primers in 
amplification reactions of NUCP gene nucleic acid 
sequences). With respect to NUCP gene regulation, such 
techniques can be used to regulate biological functions. 
Further, such sequences may be used as part of ribozyme 
and/or triple helix sequences that are also useful for NUCP 
gene regulation. 

Additionally, the antisense oligonucleotides may com- 
prise at least one modified base moiety which is selected 
from the group including but not limited to 5-fluorouracil, 
5-bromouracil, 5-chlorouracil, 5-iodouracil, hypoxanthine, 
xantine, 4-acetylcytosine, 5-(carboxybydroxylmethyl) 
uracil, 5-carboxymethylaminomethyl-2-thiouridine, 
5-carboxymethylaminomethyluracil, dihydrouracil, beta-D- 
galactosylqueosine, inosine, N6-isopentenyladenine, 

1- methylguanine, 1-methylinosine, 2,2-dimethylguanine, 

2- methyladenine, 2-methylguanine, 3-methylcytosine, 
5-methylcytosine, N6-adenine, 7-methylguanine, 
5-methylaminomethyluracil, 5-methoxyaminomethyl-2- 
thiouracil, beta-D-mannosylqueosine, 
5'-methoxycarboxymethyluracil, 5-methoxyuracil, 
2-methylthio-N6-isopentenyladenine, uracil-5-oxyacetic 
acid (v), wybutoxosine, pseudouracil, queosine, 
2-thiocytosine, 5-methyl-2-thiouraciI, 2-thiouracil, 
4-thiouracil, 5-methyluracil, uracil-5-oxyacetic acid 
methylester, uracil-5-oxyacetic acid (v), 5-methyl-2- 
thiouracil, 3-(3-amino-3-N-2-carboxypropyl) uracil, (acp3) 
w, and 2,6-diaminopurine. 

The antisense oligonucleotide may also comprise at least 
one modified sugar moiety selected from the group includ- 
ing but not limited to arabinose, 2-fluoroarabinose, xylulose, 
and a hexose. 
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In yet another embodiment, the antisense oligonucleotide 
comprises at least one modified phosphate backbone 
selected from the group consisting of a phosphorothioate, a 
phosphorodithioate, a phosphoramidothioate, a 

5 phosphoramid ate, a p hosphordiamidate, a 
methylphosphonate, an alkyl phosphotriester, and a formac- 
etal or analog thereof. 

In yet another embodiment, the antisense oligonucleotide 
is an a-anomeric oligonucleotide. An a-anomeric oligo- 

10 nucleotide forms specific double-stranded hybrids with 
complementary RNAin which, contrary to the usual p-units, 
the strands run parallel to each other (Gautier et al., 1987, 
Nucl. Acids Res. 15:6625-6641). The oligonucleotide is a 
2 , -0-methylribonucleotide (Inoue et al., 1987, Nucl. Acids 

15 Res. 10 15:6131-6148), or a chimeric RNA-DNA analogue 
(Inoue et al., 1987, FEBS Lett. 215:327-330). 

Oligonucleotides of the invention may be synthesized by 
standard methods known in the art, e.g. by use of an 
automated DNA synthesizer (such as are commercially 

20 available from Biosearch, Applied Biosystems, etc.). As 
examples, phosphorothioate oligonucleotides may be syn- 
thesized by the method of Stein et al. (1988, Nucl. Acids 
Res. 16:3209), methylphosphonate oligonucleotides can be 
prepared by use of controlled pore glass polymer supports 

25 (Sarin et al., 1988, Proc. Natl. Acad. Sci. U.S.A. 
85:7448-7451), etc. 

Low stringency conditions are well known to those of 
skill in the art, and will vary predictably depending on the 

30 specific organisms from which the library and the labeled 
sequences are derived. For guidance regarding such condi- 
tions see, for example, Sambrook et al., 1989, Molecular 
Cloning, A Laboratory Manual (and periodic updates 
thereof), Cold Springs Harbor Press, N.Y.; and Ausubel et 

35 al., 1989, Current Protocols in Molecular Biology, Green 
Publishing Associates and Wiley Interscience, N.Y. 

Alternatively, suitably labeled NUCP nucleotide probes 
can be used to screen a human genomic library using 
appropriately stringent conditions or by PCR. The identifi- 

40 cation and characterization of human genomic clones is 
helpful for identifying polymorphisms, determining the 
genomic structure of a given locus/allele, and designing 
diagnostic tests. For example, sequences derived from 
regions adjacent to the intron/exon boundaries of the human 

45 gene can be used to design primers for use in amplification 
assays to detect mutations within the exons, introns, splice 
sites (e.g., splice acceptor and/or donor sites), etc., that can 
be used in diagnostics and pharmacogenomics. 

Further, a NUCP gene homolog can be isolated from 

50 nucleic acid of the organism of interest by performing PCR 
using two degenerate or "wobble" oligonucleotide primer 
pools designed on the basis of amino acid sequences within 
the NUCP product disclosed herein. The template for the 
reaction may be total RNA, mRNA, and/or cDNA obtained 

55 by reverse transcription of mRNA prepared from, for 
example, human or non-human cell lines or tissue, such as 
choroid plexus, known or suspected to express a NUCP gene 
allele. 

The PCR product may be subcloned and sequenced to 
60 ensure that the amplified sequences represent the sequence 
of the desired NUCP gene. The PCR fragment may then be 
used to isolate a full length cDNA clone by a variety of 
methods. For example, the amplified fragment may be 
labeled and used to screen a cDNA library, such as a 
65 bacteriophage cDNA library. Alternatively, the labeled frag- 
ment may be used to isolate genomic clones via the screen- 
ing of a genomic library. 
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PCR technology may also be utilized to isolate full length truncated NUCPs, and NUCP fusion proteins. These 

cDNA sequences. For example, RNA may be isolated, include, but are not limited to nucleotide sequences encod- 

following standard procedures, from an appropriate cellular ing the mutant NUCPs described below; polypeptides or 

or tissue source (i.e., one known, or suspected, to express a peptides corresponding to one or more domains of NUCP or 

NUCP gene, such as, for example, brain tissue). A reverse 5 portions of these domains; truncated NUCPs in which one or 
transcription (RT) reaction may be performed on the RNA more of the domains is deleted, or a truncated nonfunctional 

using an oligonucleotide primer specific for the most 5' end NUCPs. Nucleotides encoding fusion proteins may include, 

of the amplified fragment for the ^ pnming .of first strand but are not limited to, full length NUCP sequences, truncated 

synthesis. The resulting RNA/DNA hybrid may then be Qf nucleotides encodin tide fragmen ts of a 

tailed using a standard terminal transferase reaction, the Hn v TTrOD I , , 1# , & r r & 

hybrid may be digested with RNase H, and second strand 10 NVCP ^ d *r» T t ° T ?T ? 1 I 

synthesis may then be primed with a complementary primer. exam P le ' ^CP domain fused to an Ig Fc domain which 

Thus, cDNA sequences upstream of the amplified fragment mcrease f the *f and h f}? e f the resulUn S 

may easily be isolated. For a review of cloning strategies P rotein ( e -S- NUCP-Ig) m the bloodstream; or an enzyme 

which may be used, see e.g., Sambrook et al., 1989, supra. such as a fluorescent protein or a luminescent protein which 

A cDNA of a mutant NUCP gene may be isolated, for 15 can be used 35 a marker - 

example, by using PCR. In this case, the first cDNA strand The invention also encompasses (a) DNA vectors that 

may be synthesized by hybridizing an oligo-dT oligonucle- contain any of the foregoing NUCP coding sequences and/or 

otide to mRNA isolated from tissue known or suspected to their complements (i.e., antisense); (b) DNA expression 

be expressed in an individual putatively carrying a mutant vectors that contain any of the foregoing NUCP coding 

NUCP allele, and by extending the new strand with reverse 20 sequences operatively associated with a regulatory element 

transcriptase. The second strand of the cDNA is then syn- that directs the expression of the coding sequences; (c) 

thesized using an oligonucleotide that hybridizes specifi- genetically engineered host cells that contain any of the 

cally to the 5' end of the normal gene. Using these two foregoing NUCP coding sequences operatively associated 

primers, the product is then amplified via PCR, optionally with a regulatory element that directs the expression of the 

cloned into a suitable vector and subjected to DNA a coding sequences in the host ceU; and (d) genetically engi- 

sequence analysis through methods weU known to those of neer< * ^ ^ ^ afl cmto ^£iis NUCP gene 

skill in the art. By comparing the DNA sequence of the , tU 4 , r , . 7 , . , 

mutant NUCP allele to that of the normal NUCP allele, the ^ the , control of aD exogenously introduced regu atory 

mutation(s) responsible for the loss or alteration of function e emeDt activatl0 °)- As used herein, regulatory 

of the mutant NUCP gene product can be ascertained. in elements ^hide, but are not limited to, inducible and 

Alternatively, a genomic library can be constructed using 3 ° °J»-^^ promoters, enhancers operators and other 

DNA obtained from an individual suspected of or known to elem 1 entS known t0 ^ 0S ? skU1 ^ d m tn f art that * lv f V* 

carry the mutant NUCP allele (e.g., a person manifesting a regulate expression. Such regulatory elemente include but 

NUCP-associated phenotype such as, for example, obesity, are , not hmited * ih * cytomegalovirus hCMV immediate 

high blood pressure, etc.), or a cDNA library can be con- 35 ^ regulatable viral (particularly retroviral LTR 

structed using RNA from a tissue known, or suspected, to promote*) the early or late promoters of SV40 adenovirus^ 

express a mutant NUCP allele. Hie normal NUCP gene, or the laC ^ T™' SyStem J tbe TA ° 

any suitable fragment thereof, can then be labeled and used sys ! em > th f e ™ C ^ maJOr 

as a probe to identify the corresponding mutant NUCP allele regl0nS ° f pha & e lambda ' ^ T ? W *T ^ 

in such libraries. Clones containing the mutant NUCP gene 40 p /° tem ' the P romote ' f ° r ^-phosphoglycerate kinase (PGK), 

sequences may then be purified and subjected to sequence the P romotere of acid phosphatase, and the promoters of the 

analysis according to methods well known to those of skill yeast a - matin g factors. 

in the art. 5 - 2 - ™ E NUCPS and NUCP Polypeptides and Peptides 

Additionally, an expression library can be constructed Derived Therefrom 
utilizing cDNA synthesized from, for example, RNA iso- 45 The NUCPs, NUCP polypeptides, NUCP peptide 
lated from a tissue known, or suspected, to express a mutant fragments, mutated, truncated, or deleted forms of a NUCP, 
NUCP allele in an individual suspected of or known to carry and/or NUCP fusion proteins can be prepared for a variety 
such a mutant allele. In this manner, gene products made by °f uses > including but not limited to the generation of 
the putatively mutant tissue may be expressed and screened antibodies, as reagents in diagnostic assays, the identifica- 
using standard antibody screening techniques in conjunction 50 lion of otDer cellular gene products related to a NUCP, as 
with antibodies raised against the normal NUCP product as reagents in assays for screening for compounds that can be 
described below (For screening techniques, see, for as pharmaceutical reagents useful in the therapeutic treat- 
example, Harlow, E. and Lane, eds., 1988, "Antibodies: A meQt of mental, biological, or medical disorders and disease. 
Laboratory Manual", Cold Spring Harbor Press, Cold Spring The Sequence Listing discloses the amino acid sequences 
Harbor.) 55 encoded by the described NUCP polynucleotides. The 

Additionally, screening can be accomplished by screening NUCP sequences both display initiator methionines that are 

with labeled NUCP fusion proteins, such as, for example, present in a DNA sequence context consistent with a trans- 

AP-NUCP or NUCP-AP fusion proteins. In cases where a lation initiation site (Kozak sequence). 

NUCP mutation results in an expressed gene product with The NUCP sequences of the invention include the micle- 

altered function (e.g., as a result of a missense or a frame- go otide and amino acid sequences presented in the Sequence 

shift mutation), a polyclonal set of antibodies to NUCP are Listing as well as analogues and derivatives thereof. Further, 

likely to cross-react with the mutant NUCP gene product. corresponding NUCP homologues from other species are 

Library clones detected via their reaction with such labeled encompassed by the invention. In fact, any NUCP protein 

antibodies can be purified and subjected to sequence analy- encoded by the NUCP nucleotide sequences described 

sis according to methods well known in the art. 65 above are within the scope of the invention as are any novel 

The invention also encompasses nucleotide sequences polynucleotide sequences encoding all or any novel portion 

that encode mutant NUCPs, peptide fragments of NUCPs, of an amino acid sequence presented in the Sequence 
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Listing. The degenerate nature of the genetic code is well in order to identify amino acid sequence motifs that are 

known, and, accordingly, each amino acid presented in the conserved between different species. Non-conservative 

Sequence Listing, is generically representative of the well changes can be engineered at variable positions to alter 

known nucleic acid "triplet" codon, or in many cases function, signal transduction capability, or both, 

codons, that can encode the amino acid. As such, as con- 5 Alternatively, where alteration of function is desired, dele- 

templated herein, the amino acid sequences presented in the tion or non-conservative alterations of the conserved regions 

Sequence Listing, when taken together with the genetic code (i.e., identical amino acids) can be engineered. For example, 

(see, for example, Table 4-1 at page 109 of "Molecular Cell deletion or non -conservative alterations (substitutions or 

Biology", 1986, J. Darnell et al. eds., Scientific American insertions) of the various conserved transmembrane 

Books, New York, N.Y, herein incorporated by reference) 30 domains. 

are generically representative of all the various permutations Other mutations to a NUCP coding sequence can be made 

and combinations of nucleic acid sequences that can encode to generate NUCPs that are better suited for expression, 

such amino acid sequences. scale up, etc. in the host cells chosen. For example, cysteine 

The invention also encompasses proteins that are func- residues can be deleted or substituted with another amino 

tionally equivalent to the NUCP encoded by the presently 15 acid in order to eliminate disulfide bridges; N-linked gly- 

described nucleotide sequences, as judged by any of a cosylation sites can be altered or eliminated to achieve, for 

number of criteria, including, but not limited to, the ability example, expression of a homogeneous product that is more 

to partition into the mitochondria, or other cellular mem- easily recovered and purified from yeast hosts which are 

brane structure, and effect uncoupling activity, change in known to hyperglycosylate N-linked sites. To this end, a 

cellular metabolism (e.g., ion flux, tyrosine phosphorylation, 20 variety of amino acid substitutions at one or both of the first 

etc.), or change in phenotype when the NUCP equivalent is or third amino acid positions of any one or more of the 

expressed at similar levels, or mutated, in an appropriate cell glycosylation recognition sequences which occur in an ECD 

type (such as the amelioration, prevention or delay of a (N-X-S or N-X-T), and/or an amino acid deletion at the 

biochemical, biophysical, or overt phenotype). Functional second position of any one or more such recognition 

equivalents of a NUCP include naturally occurring NUCPs 25 sequences in an ECD will prevent glycosylation of the 

present in other species and mutant NUCPs whether natu- NUCP at the modified tripeptide sequence. (See, e.g., Miya- 

rally occurring or engineered (by site directed mutagenesis, jima et al., 1986, EMBO J. 5(6): 1193-1197). 

gene shuffling, directed evolution as described in, for Peptides corresponding to one or more domains of a 

example, U.S. Pat. No. 5,837,458). The invention also NUCP, truncated or deleted NUCPs, as well as fusion 

includes degenerate nucleic acid variants and splice variant 30 proteins in which a full length NUCP, a NUCP peptide, or 

of the disclosed NUCP polynucleotide sequence. a truncated NUCP is fused to an unrelated protein, are also 

Additionally contemplated are polynucleotides encoding within the scope of the invention and can be designed on the 

NUCP ORFs, or their functional equivalents, encoded by basis of the presently disclosed NUCP gene nucleotide and 

polynucleotide sequences that are about 99, 95, 90, or about NUCP amino acid sequences. Such fusion proteins include, 

85 percent similar or identical to corresponding regions of 35 but are not limited to, Ig Fc fusions which stabilize a NUCP 

the nucleotide sequences of the Sequence Listing (as mea- protein, or NUCP peptides, and prolong half-life in vivo; or 

sured by BLAST sequence comparison analysis using, for fusions to any amino acid sequence that allows the fusion 

example, the GCG sequence analysis package using stan- protein to be anchored to the cell membrane; or fusions to an 

dard default settings), enzyme, fluorescent protein, or luminescent protein which 

Functionally equivalent NUCP proteins include, but are 40 provide a marker function, 
not limited to, additions or substitutions of amino acid While the NUCPs and NUCP peptides can be chemically 
residues within the amino acid sequence encoded by the synthesized (e.g., see Creighton, 1983, Proteins: Structures 
NUCP nucleotide sequences described above, but which and Molecular Principles, W. H. Freeman & Co., N.Y.), large 
result in a silent change, thus producing a functionally polypeptides derived from a full length NUCP can be 
equivalent gene product. Amino acid substitutions may be 45 advantageously produced by recombinant DNA technology 
made on the basis of similarity in polarity, charge, solubility, using techniques well known in the art for expressing 
hydrophobicity, hydrophilicity, and/or the amphipathic nucleic acids containing NUCP gene sequences and/or cod- 
nature of the residues involved. For example, nonpolar ing sequences. Such methods can be used to construct 
(hydrophobic) amino acids include alanine, leucine, expression vectors containing the described NUCP nucle- 
isoleucine, valine, proline, phenylalanine, tryptophan, and 50 otide sequences and appropriate transcriptional and transla- 
methionine; polar neutral amino acids include glycine, tional control signals. These methods include, for example, 
serine, threonine, cysteine, tyrosine, asparagine, and in vitro recombinant DNA techniques, synthetic techniques, 
glutamine; positively charged (basic) amino acids include and in vivo genetic recombination. See, for example, the 
arginine, lysine, and histidine; and negatively charged techniques described in Sambrook et al., 1989, supra, and 
(acidic) amino acids include aspartic acid and glutamic acid. 55 Ausubel et al., 1989, supra. Alternatively, RNAcorrespond- 

While random mutations can be made to NUCP encoding ing to all or a portion of a transcript encoded by a NUCP 

DNA (using random mutagenesis techniques well known to gene sequence can be chemically synthesized using, for 

those skilled in the art) and the resulting mutant NUCPs example, synthesizers. See, for example, the techniques 

tested for activity, site-directed mutations of the NUCP described in "Oligonucleotide Synthesis", 1984, Gait, M. J. 

coding sequence can be engineered (using site-directed 60 ed., IRL Press, Oxford, which is incorporated by reference 

mutagenesis techniques well known to those skilled in the herein in its entirety. 

art) to generate mutant NUCPs with increased function, e.g., A variety of host -expression vector systems can be uti- 

higher receptor binding affinity, decreased function, and/or lized to express the NUCP-encoding nucleotide sequences 

increased physiological half-life, and increased signal trans- of the invention. Where a NUCP peptide or polypeptide is a 

duction triggering. One starting point for such analysis is by 65 soluble derivative (e.g., NUCP peptides corresponding to an 

aligning the disclosed human sequences with corresponding ECD; truncated or deleted NUCP in which a TM and/or CD 

gene/protein sequences from, for example, other mammals are deleted, etc.) the peptide can be recovered from the host 
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cell in cases where the NUCP peptide or polypeptide is not recombinant virus (i.e., virus lacking the proteinaceous coat 

secreted, and from the culture media in cases where the coded for by the polyhedrin gene). These recombinant 

NUCP peptide or polypeptide is secreted by the cells. viruses are then used to infect Spodoptera frugiperda cells in 

However, such expression systems also encompass engi- which the inserted gene is expressed (e.g., see Smith et al., 

neered host cells that express a NUCP, or a functional 5 1983, J. Virol. 46: 584; Smith, U.S. Pat. No. 4,215,051). 

equivalent thereof, in situ, i.e., anchored in the cell mem- In mammalian host cells, a number of viral-based expres- 

brane. Purification or enrichment of a NUCP from such sion systems may be utilized. In cases where an adenovirus 

expression systems can be accomplished using appropriate is used as an expression vector, the NUCP gene nucleotide 

detergents and lipid micelles and methods well known to sequence of interest may be ligated to an adenovirus 

those skilled in the art. However, such engineered host cells 10 transcription/translation control complex, e.g., the late pro- 

themselves may be used in situations where it is important moter and tripartite leader sequence. This chimeric gene 

not only to retain the structural and functional characteristics may then be inserted in the adenovirus genome by in vitro 

of a NUCP, but to assess biological activity, e.g., in drug or in vivo recombination. Insertion in a non-essential region 

screening assays. of the viral genome (e.g., region El or E3) will result in a 

The expression systems that can be used for purposes of ^ recombinant virus that is viable and capable of expressing 

the invention include but are not limited to microorganisms NUCP in infected hosts (e.g., See Logan & Shenk, 1984, 

such as bacteria (e.g., E. coli, B. subtilis) transformed with Proc. Natl. Acad. Sci. USA 81:3655-3659). Specific initia- 

recombinant bacteriophage DNA, plasmid DNA or cosmid tion signals may also be required for efficient translation of 

DNA expression vectors containing NUCP nucleotide NUCP transcripts. These signals include the ATG initiation 

sequences; yeast (e.g., Saccharomyces, Pichia) transformed 2 o codon and adjacent sequences. In cases where an entire 

with recombinant yeast expression vectors containing NUCP gene or cDNA, including its own initiation codon and 

NUCP nucleotide sequences; insect cell systems infected adjacent sequences, is inserted into the appropriate expres- 

with recombinant virus expression vectors (e.g., sion vector, no additional translational control signals may 

baculovirus) containing NUCP sequences; plant cell systems be needed (for example an independent ribosome entry site, 

infected with recombinant virus expression vectors (e.g., 2 5 or IRES, site). However, in cases where only a portion of a 

cauliflower mosaic virus, CaMV; tobacco mosaic virus, NUCP coding sequence is inserted, exogenous translational 

TMV) or transformed with recombinant plasmid expression control signals, including, perhaps, the ATG initiation 

vectors (e.g., Ti plasmid) containing NUCP nucleotide codon, must be provided. Furthermore, the initiation codon 

sequences; or mammalian cell systems (e.g., COS, CHO, must be in phase with the reading frame of the desired 

BHK, 293, 3T3) harboring recombinant expression con- 30 coding sequence to ensure translation of the entire insert, 

structs containing promoters derived from the genome of These exogenous translational control signals and initiation 

mammalian cells (e.g., metallothionein promoter) or from codons can have a variety of origins, both natural and 

mammalian viruses (e.g., the adenovirus late promoter; the synthetic. The efficiency of expression may be enhanced by 

vaccinia virus 7.5K promoter). the inclusion of appropriate transcription enhancer elements, 

In bacterial systems, a number of expression vectors may 35 transcription terminators, etc. (See Bittner et al., 1987, 
be advantageously selected depending upon the use intended Methods in Enzymol. 153:516-544). 
for the NUCP product being expressed. For example, when In addition, a host cell strain may be chosen that modu- 
a large quantity of such a protein is to be produced for the lates the expression of the inserted sequences, or modifies 
generation of pharmaceutical compositions of or containing and processes the gene product in the specific fashion 
NUCP, or for raising antibodies to a NUCP, vectors that 40 desired. Such modifications (e.g., glycosylation) and pro- 
direct the expression of high levels of fusion protein prod- cessing (e.g., cleavage) of protein products may be impor- 
ucts that are readily purified may be desirable. Such vectors tant for the function of the protein. Different host cells have 
include, but are not limited, to the E. coli expression vector characteristic and specific mechanisms for the post- 
pUR278 (Ruther et al., 1983, EMBO J. 2:1791), in which a translational processing and modification of proteins and 
NUCP coding sequence may be ligated individually into the 45 gene products. Appropriate cell lines or host systems can be 
vector in frame with the lacZ coding region so that a fusion chosen to ensure the correct modification and processing of 
protein is produced; pIN vectors (Inouye & Inouye, 1985, the foreign protein expressed. To this end, eukaryotic host 
Nucleic Acids Res. 13:3101-3109; Van Heeke & Schuster, cells which possess the cellular machinery for proper pro- 
1989, J. Biol. Chem. 264:5503-5509); and the like. pGEX cessing of the primary transcript, glycosylation, and phos- 
vectors may also be used to express foreign polypeptides as 50 phorylation of the gene product may be used. Such mam- 
fusion proteins with glutathione S- transferase (GST). In malian host cells include, but are not limited to, CHO, 
general, such fusion proteins are soluble and can easily be VERO, BHK, HeLa, COS, MDCK, 293, 3T3, WI38, and in 
purified from lysed cells by adsorption to glutathione- particular, human cell lines. 

agarose beads followed by elution in the presence of free For long-term, high-yield production of recombinant 

glutathione. The PGEX vectors are designed to include 55 proteins, stable expression is preferred. For example, cell 
thrombin or factor Xa protease cleavage sites so that the ~ lines that stably express the presently described NUCPs can 

cloned target gene product can be released from the GST be engineered. Rather than using expression vectors which 

moiety, contain viral origins of replication, host cells can be trans- 

In an insect system, A utographa calif ornica nuclear poly- formed with DNA controlled by appropriate expression 

hidrosis virus (AcNPV) is used as a vector to express foreign 60 control elements (e.g., promoter, enhancer sequences, tran- 

genes. The virus grows in Spodoptera frugiperda cells. The scription terminators, polyadenylation sites, etc.), and a 

NUCP gene coding sequence may be cloned individually selectable marker. Following the introduction of the foreign 

into non-essential regions (for example the polyhedrin gene) DNA, engineered cells can be allowed to grow for 1-2 days 

of the virus and placed under control of an AcNPV promoter in an enriched media, and then are switched to a selective 

(for example the polyhedrin promoter). Successful insertion 65 media. The selectable marker in the recombinant plasmid 

of the NUCP gene coding sequence will result in inactiva- confers resistance to the selection and allows cells to stably 

tion of the polyhedrin gene and production of non-occluded integrate the plasmid into their chromosomes and grow to 
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form foci which in turn can be cloned and expanded into cell 
lines. This method may advantageously be used to engineer 
cell lines that express a NUCP. Such engineered cell lines 
may be particularly useful in screening and evaluation of 
compounds that affect the endogenous activity of the NUCP 5 
product. 

Anumber of selection systems may be used, including but 
not limited to the herpes simplex virus thymidine kinase 
(Wigler, et al., 1977, Cell 11:223), hypoxanthine-guanine 
phosphoribosyltransferase (Szybalska & Szybalski, 1962, 10 
Proc. Natl. Acad. Sci. USA 48:2026), and adenine phospho- 
ribosyltransferase (Lowy, et al., 1980, Cell 22:817) genes 
can be employed in tk", hgprt" or aprt~ cells, respectively. 
Also, antimetabolite resistance can be used as the basis of 
selection for the following genes: dhfr, which confers resis- 15 
tance to methotrexate (Wigler, et al., 1980, Natl. Acad. Sci. 
USA 77:3567; O'Hare, et al., 1981, Proc. Natl. Acad. Sci. 
USA 78:1527); gpt, which confers resistance to mycophe- 
nolic acid (Mulligan & Berg, 1981, Proc. Natl. Acad. Sci. 
USA 78:2072); neo, which confers resistance to the ami- 20 
noglycoside G-418 (Colberre-Garapin, et al, 1981, J. Mol. 
Biol. 150:1); and hygro, which confers resistance to hygro- 
mycin (Santerre, et al., 1984, Gene 30:147). 

Alternatively, any fusion protein may be readily purified 
by utilizing an antibody specific for the fusion protein being 25 
expressed. For example, a system described by Janknecht et 
al. allows for the ready purification of non-denatured fusion 
proteins expressed in human cell lines (Janknecht, et al., 
1991, Proc. Natl. Acad. Sci. USA 88: 8972-^8976). In this 
system, the gene of interest is subcloned into a vaccinia 30 
recombination plasmid such that the gene's open reading 
frame is translationally fused to an amino-terminal tag 
consisting of six histidine residues. Extracts from cells 
infected with recombinant vaccinia virus are loaded onto 
Ni 2+ -nitriloacetic acid-agarose columns and histidine- 35 
tagged proteins are selectively eluted with imidazole - 
containing buffers. 

NUCP products can also be expressed in transgenic 
animals. Animals of any species, including, but not limited 4Q 
to, worms, mice, rats, rabbits, guinea pigs, pigs, micro-pigs, 
birds, goats, and non-human primates, e.g., baboons, 
monkeys, and chimpanzees may be used to generate NUCP 
transgenic animals. 

Any technique known in the art may be used to introduce 45 
a NUCP transgene into animals to produce the founder lines 
of transgenic animals. Such techniques include, but are not 
limited to pronuclear microinjection (Hoppe, P. C. and 
Wagner, T. E., 1989, U.S, Pat. No. 4,873,191); retrovirus 
mediated gene transfer into germ lines (Van der Putten et al., 50 
1985, Proc. Natl. Acad. Sci., USA 82:6148-6152); gene 
targeting in embryonic stem cells (Thompson et al., 1989, 
Cell 56:313-321); electroporation of embryos (Lo, 1983, 
Mol Cell. Biol. 3:1803-1814); and sperm-mediated gene 
transfer (Lavitrano et al., 1989, Cell 57:717-723); etc. For a 55 
review of such techniques, see Gordon, 1989, Transgenic 
Animals, Intl. Rev. Cytol. 115:171-229, which is incorpo- 
rated by reference herein in its entirety. 

The present invention provides for transgenic animals that 
carry a NUCP transgene in all their cells, as well as animals 60 
which carry the transgene in some, but not all their cells, i.e., 
mosaic animals or somatic cell transgenic animals. The 
transgene may be integrated as a single transgene or in 
concatamers, e.g., head-to-head tandems or head-to-tail tan- 
dems. The transgene may also be selectively introduced into 65 
and activated in a particular cell type by following, for 
example, the teaching of Lasko et al., 1992, Proc. Natl. 
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Acad. Sci. USA 89:6232-6236. The regulatory sequences 
required for such a cell-type specific activation will depend 
upon the particular cell type of interest, and will be apparent 
to those of skill in the art. 

When it is desired that the NUCP transgene be integrated 
into the chromosomal site of the endogenous NUCP gene, 
gene targeting is preferred. Briefly, when such a technique is 
to be utilized, vectors containing some nucleotide sequences 
homologous to the endogenous NUCP gene are designed for 
the purpose of integrating, via homologous recombination 
with chromosomal sequences, into and disrupting the func- 
tion of the nucleotide sequence of the endogenous NUCP 
gene (i.e., "knockout" animals). 

The transgene may also be selectively introduced into a 
particular cell type, thus inactivating the endogenous NUCP 
gene in only that cell type, by following, for example, the 
teaching of Gu et al., 1994, Science, 265:103-106. The 
regulatory sequences required for such a cell-type specific 
inactivation will depend upon the particular cell type of 
interest, and will be apparent to those of skill in the art. 

Once transgenic animals have been generated, the expres- 
sion of the recombinant NUCP gene may be assayed utiliz- 
ing standard techniques. Initial screening may be accom- 
plished by Southern blot analysis or PCR techniques to 
analyze animal tissues to assay whether integration of the 
transgene has taken place. The level of mRNA expression of 
the transgene in the tissues of the transgenic animals may 
also be assessed using techniques which include but are not 
limited to Northern blot analysis of tissue samples obtained 
from the animal, in situ hybridization analysis, and RT-PCR. 
Samples of NUCP gene-expressing tissue, may also be 
evaluated immunocytochemically using antibodies specific 
for the NUCP transgene product. 

5.3. Antibodies to NUCPS 

Antibodies that specifically recognize one or more 
epitopes of a NUCP, or epitopes of conserved variants of a 
NUCP, or peptide fragments of a NUCP are also encom- 
passed by the invention. Such antibodies include but are not 
limited to polyclonal antibodies, monoclonal antibodies 
(mAbs), humanized or chimeric antibodies, single chain 
antibodies, Fab fragments, F(ab') 2 fragments, fragments 
produced by a Fab expression library, anti-idiotypic (anti-Id) 
antibodies, and epitope-binding fragments of any of the 
above. 

The antibodies of the invention can be used, for example, 
in the detection of a NUCP in a biological sample and may, 
therefore, be utilized as part of a diagnostic or prognostic 
technique whereby patients may be tested for abnormal 
amounts of a NUCP. Such antibodies may also be utilized in 
conjunction with, for example, compound screening 
schemes, as described below, for the evaluation of the effect 
of test compounds on expression and/or activity of a NUCP 
gene product. Additionally, such antibodies can be used in 
conjunction gene therapy to, for example, evaluate the 
normal and/or engineered NU CP-expressing cells prior to 
their introduction into the patient. Such antibodies may 
additionally be used as a method for inhibiting abnormally 
high NUCP activity. Thus, such antibodies may, therefore, 
be utilized as part of treatment methods. 

For the production of antibodies, various host animals 
may be immunized by injection with a NUCP, a NUCP 
peptide (e.g., one corresponding the a functional domain of 
a NUCP), truncated NUCP polypeptides (a NUCP in which 
one or more domains have been deleted), functional equiva- 
lents of the NUCP or mutants of the NUCP. Such host 
animals may include but are not limited to rabbits, mice, 
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goats, and rats, to name but a few. Various adjuvants may be 
used to increase the immunological response, depending on 
the host species, including but not limited to Freund's 
(complete and incomplete), mineral gels such as aluminum 
hydroxide, surface active substances such as lysolecithin, 5 
pluronic polyols, polyanions, peptides, oil emulsions, key- 
hole limpet hemocyanin, dinitrophenol, and potentially use- 
ful human adjuvants such as BCG (bacille Calmette-Guerin) 
and Cotynebacterium parvum. Polyclonal antibodies are 
heterogeneous populations of antibody molecules derived 10 
from the sera of the immunized animals. 

Monoclonal antibodies, which are homogeneous popula- 
tions of antibodies to a particular antigen, may be obtained 
by any technique which provides for the production of 
antibody molecules by continuous cell lines in culture. 15 
These include, but are not limited to, the hybridoma tech- 
nique of Kohler and Milstein, (1975, Nature 256:495^197; 
and U.S. Pat. No. 4,376,110), the human B-cell hybridoma 
technique (Kosbor et al., 1983, Immunology Today 4:72; 
Cole et al., 1983, Proc. Natl. Acad. Sci. USA 20 
80:2026-2030), and the EBV-hybridoma technique (Cole et 
al., 1985, Monoclonal Antibodies And Cancer Therapy, Alan 
R. Liss, Inc., pp. 77-96). Such antibodies may be of any 
immunoglobulin class including IgG, IgM, IgE, IgA, IgD 
and any subclass thereof. The hybridoma producing the 25 
mAb of this invention may be cultivated in vitro or in vivo. 
Production of high titers of mAbs in vivo makes this the 
presently preferred method of production. 

In addition, techniques developed for the production of 
"chimeric antibodies" (Morrison et al., 1984, Proc. Natl. 30 
Acad. Sci., 81:6851-6855; Neuberger et al., 1984, Nature, 
312:604-608; Takeda et al, 1985, Nature, 314:452-454) by 
splicing the genes from a mouse antibody molecule of 
appropriate antigen specificity together with genes from a 
human antibody molecule of appropriate biological activity 35 
can be used. A chimeric antibody is a molecule in which 
different portions are derived from different animal species, 
such as those having a variable region derived from a murine 
mAb and a human immunoglobulin constant region. 

Alternatively, techniques described for the production of 
single chain antibodies (U.S. Pat. No, 4,946,778; Bird, 1988, 
Science 242:423-426; Huston et al., 1988, Proc. Natl. Acad. 
Sci. USA 85:5879-5883; and Ward et al., 1989, Nature 
334:544-546) can be adapted to produce single chain anti- 45 
bodies against NUCPgene products. Single chain antibodies 
are formed by linking the heavy and light chain fragments of 
the Fv region via an amino acid bridge, resulting in a single 
chain polypeptide. 

Antibody fragments that recognize specific epitopes can 50 
be generated using known techniques. For example, such 
fragments include, but are not limited to: the F(ab') 2 frag- 
ments which can be produced by pepsin digestion of the 
antibody molecule and the Fab fragments which can be 
generated by reducing the disulfide bridges of the F(ab') 2 55 
fragments. Alternatively, Fab expression libraries may be 
constructed (Huse et al,, 1989, Science, 246:1275-1281) to 
allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity. 

Antibodies to a NUCP can, in turn, be utilized to generate 60 
anti-idiotype antibodies that "mimic" a given NUCP, using 
techniques well known to those skilled in the art. (See, e.g., 
Greenspan & Bona, 1993, FASEB J 7(5):437-444; and 
Nissinoff, 1991, J. Immunol. 147(8):2429-2438). For 
example antibodies that bind to a NUCP domain and com- 65 
petitively inhibit the binding of a NUCP to its cognate 
ligand, chaperonin, or accessory molecule(s) can be used to 
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generate anti-idiotypes that "mimic" the NUCP and, 
therefore, bind and activate or neutralize a receptor. Such 
anti-idiotypic antibodies or Fab fragments of such anti- 
idiotypes can be used in therapeutic regimens involving a 
NUCP-mediated process or pathway. 

5.4. Diagnosis of Abnormalities Related to a NUCP 

A variety of methods can be employed for the diagnostic 
and prognostic evaluation of disorders related to NUCP 
function, and for the identification of subjects having a 
predisposition to such disorders. 

Such methods may, for example, utilize reagents such as 
the NUCP nucleotide sequences described above and the 
NUCP antibodies described above. Specifically, such 
reagents may be used, for example, for: (1) the detection of 
the presence of NUCP gene mutations, or the detection of 
either over- or under-expression of NUCP mRNA relative to 
a given phenotype; (2) the detection of either an over- or an 
under-abundance of NUCP gene product relative to a given 
phenotype; and (3) the detection of perturbations or abnor- 
malities in any metabolic, physiologic, or catabolic pathway 
mediated by NUCP. 

The methods described herein may be performed, for 
example, by utilizing pre-packaged diagnostic kits compris- 
ing at least one specific NUCP nucleotide sequence or 
NUCP antibody reagent described herein, which may be 
conveniently used, e.g., in clinical settings, to diagnose 
patients exhibiting, for example, body weight disorder 
abnormalities. 

For the detection of NUCP mutations, any nucleated cell 
can be used as a starting source for genomic nucleic acid. 
For the detection of NUCP gene expression or NUCP gene 
products, any cell type or tissue in which the NUCP gene is 
expressed, such as, for example, kidney cells, may be 
utilized. 

Nucleic acid-based detection techniques are described, 
below, in Section 5.4.1. Peptide detection techniques are 
described, below, in Section 5.4.2. 

5.4.1. Detention of NUCP Sequences 

Mutations within a NUCP nucleotide sequence can be 
detected by utilizing a number of techniques. Nucleic acid 
from any nucleated cell can be used as the starting point for 
such assay techniques, and can be isolated according to 
standard nucleic acid preparation procedures which are well 
known to those of skill in the art. 

DNAmay be used in hybridization or amplification assays 
of biological samples to detect abnormalities involving 
NUCP gene structure, including point mutations, insertions, 
deletions and chromosomal rearrangements. Such assays 
may include, but are not limited to, Southern analyses, single 
stranded conformational polymorphism analyses (SSCP), 
and PCR analyses. 

Such diagnostic methods for the detection of NUCP 
gene-specific mutations can involve for example, contacting 
and incubating nucleic acids including recombinant DNA 
molecules, cloned genes or degenerate variants thereof, 
obtained from a sample, e.g., derived from a patient sample 
or other appropriate cellular source, with one or more 
labeled nucleic acid reagents including recombinant DNA 
molecules, cloned genes or degenerate variants thereof, as 
described above, under conditions favorable for the specific 
annealing of these reagents to their complementary 
sequences within a NUCP gene. Preferably, the lengths of 
these nucleic acid reagents are at least about 15 to about 30 
nucleotides. After incubation, all non-annealed nucleic acids 
are removed from the nucleic acid: NUCP molecule hybrid. 
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The presence of nucleic acids which have hybridized, if any 
such molecules exist, is then detected. Using such a detec- 
tion scheme, the nucleic acid from the cell type or tissue of 
interest can be immobilized, for example, to a solid support 
such as a membrane, or a plastic surface such as that on a 
microliter plate or polystyrene beads. In this case, after 
incubation, non-annealed, labeled nucleic acid reagents of 
the type described above are easily removed. Detection of 
the remaining annealed, labeled NUCP nucleic acid reagents 



of the expression pattern of the NUCP gene, including 
activation or inactivation of NUCP gene expression. 

In one embodiment of such a detection scheme, cDNAs 
are synthesized from the RNAs of interest (e.g., by reverse 
transcription of the RNA molecule into cDNA). A sequence 
within the cDNA is then used as the template for a nucleic 
acid amplification reaction, such as a PCR amplification 
reaction, or the like. The nucleic acid reagents used as 
synthesis initiation reagents (e.g., primers) in the reverse 



is accomplished using standard techniques well known to 10 transcription and nucleic acid amplification steps of this 



those in the art. The NUCP encoding nucleotide sequences 
to which the nucleic acid reagents have annealed can be 
compared to the annealing pattern expected from a normal 
NUCP gene sequence in order to determine whether a NUCP 
gene mutation is present. 

Alternative diagnostic methods for the detection of NUCP 
gene specific nucleic acid molecules, in patient samples or 
other appropriate cell sources, may involve their 
amplification, e.g., by PCR (the experimental embodiment 



method are chosen from among the NUCP nucleic acid 
reagents described above. The preferred lengths of such 
nucleic acid reagents are at least 9-30 nucleotides. For 
detection of the amplified product, the nucleic acid ampli- 
15 fication may be performed using radioactively or non- 
radioactively labeled nucleotides. Alternatively, enough 
amplified product may be made such that the product may be 
visualized by standard ethidium bromide staining, by utiliz- 
ing any other suitable nucleic acid staining method, or by 



set forth in Mullis, K, B., 1987, U.S. Pat. No. 4,683,202), 20 sequencing. 

followed by the detection of the amplified molecules using Additionally, it is possible to perform such NUCP gene 

techniques well known to those of skill in the art. The expression assays "in situ", i.e., directly upon tissue sections 

resulting amplified sequences can be compared to those (fixed and/or frozen) of patient tissue obtained from biopsies 

which would be expected if the nucleic acid being amplified 0 r resections, such that no nucleic acid purification is 

contained only normal copies of a NUCP gene in order to 25 necessary. Nucleic acid reagents such as those described in 



determine whether a NUCP gene mutation exists. 

Additionally, well-known genotyping techniques can be 
performed to identify individuals carrying NUCP gene 
mutations. Such techniques include, for example, the use of 
restriction fragment length polymorphisms (RFLPs), which 
involve sequence variations in one of the recognition sites 
for the specific restriction enzyme used. 

Additionally, improved methods for analyzing DNA poly- 
morphisms which can be utilized for the identification of 
NUCP gene mutations have been described which capitalize 
on the presence of variable numbers of short, tandemly 
repeated DNA sequences between the restriction enzyme 
sites. For example, Weber (U.S. Pat. No. 5,075,217, which 
is incorporated herein by reference in its entirety) describes 
a DNA marker based on length polymorphisms in blocks of 
(dC-dA)n-(dG-dT)n short tandem repeats. The average 
separation of (dC-dA)n-(dG-dT)n blocks is estimated to be 
30,000-60,000 bp. Markers which are so closely spaced 
exhibit a high frequency co-inheritance, and are extremely 
useful in the identification of genetic mutations, such as, for 
example, mutations within the NUCP gene, and the diag- 
nosis of diseases and disorders related to NUCP mutations. 

Also, Caskey et al. (U.S. Pat. No. 5,364,759, which is 
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Section 5.1 may be used as probes and/or primers for such 
in situ procedures (See, for example, Nuovo, G. J., 1992, 
"PCR In Situ Hybridization: Protocols And Applications", 
Raven Press, N.Y). 

Alternatively, if a sufficient quantity of the appropriate 
cells can be obtained, standard Northern analysis can be 
performed to determine the level of mRNA expression of the 
NUCP gene. 

5.4.2. Detention of NUCP Products 

Antibodies directed against wild type or mutant NUCPs, 
or conserved variants or peptide fragments thereof, as dis- 
cussed above, can also be used as diagnostics and 
prognostics, as described herein. Such diagnostic methods, 
may be used to detect abnormalities in the level of NUCP 
gene expression, or abnormalities in the structure and/or 
temporal, tissue, cellular, or subcellular location of the 
NUCP (besides mitochondria), and may be performed in 
vivo or in vitro, such as, for example, on biopsy tissue. 

For example, antibodies directed to one or more epitopes 
of NUCP can be used in vivo to detect the pattern and level 
of expression of NUCP in the body. Such antibodies can be 
labeled, e.g., with a radio-opaque or other appropriate com- 
pound and injected into a subject in order to visualize 



incorporated herein by reference in its entirety) describe a 50 binding to the NUCP expressed in the body using methods 



DNA profiling assay for detecting short tri and tetra nucle- 
otide repeat sequences. The process includes extracting the 
DNA of interest, such as the NUCP gene, amplifying the 
extracted DNA, and labeling the repeat sequences to form a 
genotypic map of the individual's DNA. 

The level of NUCP gene expression can also be assayed 
by detecting and measuring NUCP transcription. For 
example, RNA from a cell type or tissue known, or sus- 
pected to express the NUCP gene, such as kidney, may be 



such as X-rays, CAT-scans, or MRL Labeled antibody 
fragments, e.g., the Fab or single chain antibody comprising 
the smallest portion of the antigen binding region, are 
preferred for this purpose to promote crossing the blood- 
55 brain barrier and permit labeling of NUCP expressed in the 
brain. 

Additionally, any NUCP fusion protein or NUCP conju- 
gated protein whose presence can be detected, can be 
administered. For example, NUCP fusion or conjugated 



isolated and tested utilizing hybridization or PCR techniques 60 proteins labeled with a radio -opaque or other appropriate 



such as those described above. The isolated cells can be 
derived from cell culture or from a patient. The analysis of 
cells taken from culture may be a necessary step in the 
assessment of cells to be used as part of a cell-based gene 
therapy technique or, alternatively, to test the effect of 65 
compounds on the expression of the NUCP gene. Such 
analyses may reveal both quantitative and qualitative aspects 



compound can be administered and visualized in vivo, as 
discussed, above for labeled antibodies. Further such NUCP 
fusion proteins (such as AP-NUCP or NUCP-AP) can be 
utilized for in vitro diagnostic procedures. 

Alternatively, immunoassays or fusion protein detection 
assays, as described above, can be utilized on biopsy and 
autopsy samples in vitro to permit assessment of the expres- 
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sion pattern of the NUCR Such assays are not confined to the 
use of antibodies that define a NUCP domain, but can 
include the use of antibodies directed to epitopes of any 
domain of a NUCP. The use of each or all of these labeled 
antibodies will yield useful information regarding transla- 5 
tion and intracellular transport of the NUCP to the cell 
surface and can identify defects in processing. 

The tissue or cell type to be analyzed will generally 
include those which are known, or suspected, to express the 
NUCP gene, such as, for example, epithelial cells, kidney 10 
cells, adipose tissue, brain cells, etc. The protein isolation 
methods employed herein may, for example, be such as 
those described in Harlow and Lane (Harlow, E. and Lane, 
D., 1988, "Antibodies: A Laboratory Manual", Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor, N.Y.), which 15 
is incorporated herein by reference in its entirety. The 
isolated cells can be derived from cell culture or from a 
patient. The analysis of cells taken from culture may be a 
necessary step in the assessment of cells that could be used 
as part of a cell-based gene therapy technique or, 20 
alternatively, to test the effect of compounds on the expres- 
sion of the NUCP gene. 

For example, antibodies, or fragments of antibodies, such 
as those described above useful in the present invention may 
be used to quantitatively or qualitatively detect the presence 25 
of a NUCP, or conserved variants or peptide fragments 
thereof. This can be accomplished, for example, by immu- 
nofluorescence techniques employing a fluorescently 
labeled antibody (see below, this Section) coupled with light 
microscopic, flow cytometric, or fluorimetric detection. 30 
Such techniques are especially preferred if such NUCP 
products can be found, at least transiently, on the cell 
surface. 

The antibodies (or fragments thereof) or NUCP fusion or 35 
conjugated proteins useful in the present invention may 
additionally be employed histologically, as in 
immunofluorescence, immunoelectron microscopy or non- 
immuno assays, for in situ detection of NUCP gene products 
or conserved variants or peptide fragments thereof, or to 
assay NUCP binding (in the case of labeled NUCP-fusion 
protein). 

In situ detection may be accomplished by removing a 
histological specimen from a patient, and applying thereto a 
labeled antibody or fusion protein of the present invention. 45 
The antibody (or fragment) or fusion protein is preferably 
applied by overlaying the labeled antibody (or fragment) 
onto a biological sample. Through the use of such a 
procedure, it is possible to determine not only the presence 
of the NUCP product, or conserved variants or peptide 50 
fragments, or NUCP binding, but also its distribution in the 
examined tissue. Using the present invention, those of 
ordinary skill will readily perceive that any of a wide variety 
of histological methods (such as staining procedures) can be 
modified in order to achieve such in situ detection. 55 

Immunoassays and non-immunoassays for a NUCP, or 
conserved variants or peptide fragments thereof, will typi- 
cally comprise incubating a sample, such as a biological 
fluid, a tissue extract, freshly harvested cells, or lysates of 
cells which have been incubated in cell culture, in the 60 
presence of a detectably labeled antibody capable of iden- 
tifying NUCP products or conserved variants or peptide 
fragments thereof, and detecting the bound antibody by any 
of a number of techniques well-known in the art. 
Alternatively, the labeled antibody can be directed against an 65 
antigenic tag that has been directly or indirectly attached to 
a NUCR 
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The biological sample may be brought in contact with and 
immobilized onto a solid phase support or carrier such as 
nitrocellulose, or other solid support which is capable of 
immobilizing cells, cell particles or soluble proteins. The 
support may then be washed with suitable buffers followed 
by treatment with the detectably labeled NUCP antibody or 
NUCP ligand/accessory molecule fusion protein. The solid 
phase support may then be washed with the buffer a second 
time to remove unbound antibody or fusion protein. The 
amount of bound label on solid support may then be detected i 
by conventional means. 

By "solid phase support or carrier" is intended any 
support capable of binding an antigen or an antibody. 
Well-known supports or carriers include glass, polystyrene, 
polypropylene, polyethylene, dextran, nylon, amylases, 
natural and modified celluloses, polyacrylamides, gabbros, 
and magnetite. The nature of the carrier can be either soluble 
to some extent or insoluble for the purposes of the present 
invention. The support material may have virtually any 
possible structural configuration so long as the coupled 
molecule is capable of binding to an antigen or antibody. 
Thus, the support configuration may be spherical, as in a 
bead, or cylindrical, as in the inside surface of a test tube, or 
the external surface , of a rod. Alternatively, the surface may 
be fiat such as a sheet, test strip, etc. Preferred supports 
include polystyrene beads. Those skilled in the art will know 
many other suitable carriers for binding antibody or antigen, 
or will be able to ascertain the same by use of routine 
experimentation. 

The binding activity of a given lot of NUCP antibody or 
NUCPligand fusion protein may be determined according to 
well known methods. Those skilled in the art will be able to 
determine operative and optimal assay conditions for each 
determination by employing routine experimentation. 

With respect to antibodies, one of the ways in which the 
NUCP antibody can be detectably labeled is by linking the 
same to an enzyme and use in an enzyme immunoassay 
(EIA) (Voller, A., "The Enzyme Linked Immunosorbent 
Assay (ELISA)", 1978, Diagnostic Horizons 2:1-7, Micro- 
biological Associates Quarterly Publication, Walkersville, 
Md.); Voller, A. et al., 1978, J. Clin. Pathol. 31:507-520; 
Butler, J. E., 1981, Meth. Enzymol. 73:482-523; Maggio, E. 
(ed.), 1980, Enzyme Immunoassay, CRC Press, Boca Raton, 
Fla.; Ishikawa, E. et al, (eds.), 1981, Enzyme Immunoassay, 
Kgaku Shoin, Tokyo). The enzyme that is bound to the 
antibody will react with an appropriate substrate, preferably 
a chromogenic substrate, in such a manner as to produce a 
chemical moiety which can be detected, for example, by 
spectrophotometric, fluorimetric or by visual means. 
Enzymes which can be used to detectably label the antibody 
include, but are not limited to, malate dehydrogenase, sta- 
phylococcal nuclease, delta -5 -steroid isomerase, yeast alco- 
hol dehydrogenase, alpha-glycerophosphate, 
dehydrogenase, triose phosphate isomerase, horseradish 
peroxidase, alkaline phosphatase, asparaginase, glucose 
oxidase, beta-galactosidase, ribonuclease, urease, catalase, 
glucose-6-phosphate dehydrogenase, glucoamylase and ace- 
tylcholinesterase. The detection can be accomplished by 
calorimetric methods which employ a chromogenic sub- 
strate for the enzyme. Detection may also be accomplished 
by visual comparison of the extent of enzymatic reaction of 
a substrate in comparison with similarly prepared standards. 

Detection may also be accomplished using any of a 
variety of other immunoassays. For example, by radioac- 
tively labeling the antibodies or antibody fragments, it is 
possible to detect NUCP through the use of a radioimmu- 
noassay (RIA) (see, for example, Weintraub, B., Principles 
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of Radioimmunoassays, Seventh Training Course on Radio- 
ligand Assay Techniques, The Endocrine Society, March, 
1986, which is incorporated by reference herein). The radio- 
active isotope can be detected by such means as the use of 
a gamma counter or a scintillation counter or by autorad- 5 
iography. 

It is also possible to label the antibody with a fluorescent 
compound. When the fluoresce ntly labeled antibody is 
exposed to light of the proper wave length, its presence can 
then be detected due to fluorescence. Among the most 1° 
commonly used fluorescent labeling compounds are fluo- 
rescein isothiocyanate, rhodamine, phycoerythrin, 
phycocyanin, allophycocyanin, o-phthaldehyde and fluores- 
camine. 

The antibody can also be detectably labeled using fluo- 35 
rescence emitting metals such as 152 Eu, or others of the 
lanthanide series. These metals can be attached to the 
antibody using such metal chelating groups as diethylen- 
etriaminepentacetic acid (DTPA) or ethylenediaminetet- 
raacetic acid (EDTA). 20 

The antibody also can be detectably labeled by coupling 
it to a chemiluminescent compound. The presence of the 
chemiluminescent-tagged antibody is then determined by 
detecting the presence of luminescence that arises during the 
course of a chemical reaction. Examples of particularly 25 
useful chemiluminescent labeling compounds are luminol, 
isoluminol, theromatic acridinium ester, imidazole, acri- 
dinium salt and oxalate ester. 

Likewise, a bioluminescent compound may be used to 3Q 
label the antibody of the present invention. Bioluminescence 
is a type of chemiluminescence found in biological systems 
in, which a catalytic protein increases the efficiency of the 
chemiluminescent reaction. The presence of a biolumines- 
cent protein is determined by detecting the presence of 35 
luminescence. Important bioluminescent compounds for 
purposes of labeling are luciferin, luciferase and aequorin. 

5.5. Screening Assays for Compounds that Modulate 
NUCP Expression or Activity 

The following assays are designed to identify compounds 40 
that interact with (e.g., bind to) a NUCP, compounds that 
interfere with the interaction of a NUCP with any ligand or 
accessory molecules, compounds that modulate NUCP gene 
expression (i.e., modulate the level of NUCP activity by 
regulating gene expression) or otherwise modulate the levels 45 
of a NUCP in the body. Assays may additionally be utilized 
which identify compounds that bind to NUCP gene regula- 
tory sequences (e.g., promoter sequences) and, 
consequently, may modulate NUCP gene expression. See 
e.g., Piatt, K. A., 1994, J. Biol. Chem. 269:28558-28562, 50 
which is incorporated herein by reference in its entirety. 

The compounds which can be screened in accordance 
with the invention include but are not limited to peptides, 
antibodies and fragments thereof, and other organic com- 
pounds (e.g., peptidomimetics) that bind to NUCP and either 55 
mimic the activity of the natural product (i.e., agonists) or 
inhibit the activity of the natural ligand/accessory molecule 
(i.e., antagonists); as well as peptides, antibodies or frag- 
ments thereof, and other organic compounds that mimic the 
NUCP (or a portion thereof) and bind to and "inactivate" or 60 
"neutralize" the NUCP ligand/accessory protein. 

Such compounds may include, but are not limited to, 
peptides such as, for example, soluble peptides, including 
but not limited to members of random peptide libraries; (see, 
e.g., Lam, K. S. et al., 1991, Nature 354:82-84; Houghten, 65 
R. et al., 1991, Nature 354:84-86), and combinatorial 
chemistry-derived molecular library made of D- and/or L- 
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configuration amino acids, phosphopeptides (including, but 
not limited to members of random or partially degenerate, 
directed phosphopeptide libraries; see, e.g., Songyang, Z. et 
al., 1993, Cell 72:767-778), antibodies (including, but not 
limited to, polyclonal, monoclonal, humanized, anti- 
idiotype, chimeric or single chain antibodies, and FAb, 
F(ab') 2 and FAb expression library fragments, and epitope- 
binding fragments thereof), and small organic or inorganic 
molecules. 

Other compounds that can be screened in accordance with 
the invention include but are not limited to small organic 
molecules that are able to cross the blood -brain barrier, gain 
entry into an appropriate cell (e.g., in the choroid plexus, 
pituitary, the hypothalamus, etc.) and affect the expression of 
a NUCP gene or some other gene involved in a NUCP 
mediated pathway (e.g., by interacting with the regulatory 
region or transcription factors involved in gene expression); 
or such compounds that affect or substitute for the activity 
of the NUCP or the activity of some other intracellular factor 
involved in a NUCP-mediated catabolic, or metabolic path- 
way. 

Computer modeling and searching technologies permit 
identification of compounds, or the improvement of already 
identified compounds, that can modulate NUCP expression 
or activity. Having identified such a compound or 
composition, the active sites or regions are identified. Such 
active sites might typically be ligand binding sites. The 
active site can be identified using methods known in the art 
including, for example, from the amino acid sequences of 
peptides, from the nucleotide sequences of nucleic acids, or 
from study of complexes of the relevant compound or 
composition with its natural ligand. In the latter case, 
chemical or X-ray crystallographic methods can be used to 
find the active site by finding where on the factor the 
complexed ligand is found. 

Next, the three dimensional geometric structure of the 
active site is determined. This can be done by known 
methods, including X-ray crystallography, which can deter- 
mine a complete molecular structure. On the other hand, 
solid or liquid phase NMR can be used to determine certain 
intra-molecular distances. Any other experimental method 
of structure determination can be used to obtain partial or 
complete geometric structures. The geometric structures 
may be measured with a complexed ligand, natural or 
artificial, which may increase the accuracy of the active site 
structure determined. 

If an incomplete or insufficiently accurate structure is 
determined, the methods of computer based numerical mod- 
eling can be used to complete the structure or improve its 
accuracy. Any recognized modeling method may be used, 
including parameterized models specific to particular 
biopolymers such as proteins or nucleic acids, molecular 
dynamics models based on computing molecular motions, 
statistical mechanics models based on thermal ensembles, or 
combined models. For most types of models, standard 
molecular force fields, representing the forces between con- 
stituent atoms and groups, are necessary, and can be selected 
from force fields known in physical chemistry. The incom- 
plete or less accurate experimental structures can serve as 
constraints on the complete and more accurate structures 
computed by these modeling methods. 

Finally, having determined the structure of the active site 
(or binding site), either experimentally, by modeling, or by 
a combination, candidate modulating compounds can be 
identified by searching databases containing compounds 
along with information on their molecular structure. Such a 
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search seeks compounds having structures that match the 
determined active site structure and that interact with the 
groups defining the active site. Such a search can be manual, 
but is preferably computer assisted. These compounds found 
from this search are potential NUCP modulating com- 5 
pounds. 

Alternatively, these methods can be used to identify 
improved modulating compounds from an already known 
modulating compound or ligand. The composition of the 
known compound can be modified and the structural effects 10 
of modification can be determined using the experimental 
and computer modeling methods described above applied to 
the new composition. The altered structure is then compared 
to the active site structure of the compound to determine if 
an improved fit or interaction results. In this manner sys- 15 
tematic variations in composition, such as by varying side 
groups, can be quickly evaluated to obtain modified modu- 
lating compounds or ligands of improved specificity or 
activity. 

Further experimental and computer modeling methods 20 
useful to identify modulating compounds based upon iden- 
tification of the active sites (or binding sites) of NUCP, and 
related transduction and transcription factors will be appar- 
ent to those of skill in the art. 

25 

Examples of molecular modeling systems are the 
CHARMm and QUANTA programs (Polygen Corporation, 
Waltham, Mass.). CHARMm performs the energy minimi- 
zation and molecular dynamics functions. QUANTA per- 
forms the construction, graphic modeling and analysis of 3Q 
molecular structure, QUANTA allows interactive 
construction, modification, visualization, and analysis of the 
behavior of molecules with each other. 

A number of articles review computer modeling of drugs 
interactive with specific proteins, such as Rotivinen, et al., 35 
1988, Acta Pharmaceutical Fennica 97:159-166; Ripka, 
New Scientist 54-57 (Jun. 16, 1988); McKinaly and 
Rossmann, 1989, Annu. Rev. Pharmacol. Toxiciol. 
29:111-122; Perry and Davies, OSAR: Quantitative 
Structure -Activity Relationships in Drug Design pp. 40 
189-193 (Alan R. Liss, Inc. 1989); Lewis and Dean, 1989 
Proc. R. Soc. Lond. 236:125-140 and 141-162; and, with 
respect to a model receptor for nucleic acid components, 
Askew, et al., 1989, J. Am. Chem. Soc. 111:1082-1090. 
Other computer programs that screen and graphically depict 45 
chemicals are available from companies such as BioDesign, 
Inc. (Pasadena, Calif.), Allelix, Inc. (Mississauga, Ontario, 
Canada), and Hypercube, Inc. (Cambridge, Ontario). 
Although these are primarily designed for application to 
drugs specific to particular proteins, they can be adapted to 50 
design of drugs specific to regions of DNA or RNA, once 
that region is identified. 

Although described above with reference to design and 
generation of compounds which could alter binding, one 
could also screen libraries of known compounds, including 55 
natural products or synthetic chemicals, and biologically 
active materials, including proteins, for compounds which 
are inhibitors or activators. 

Cell-based systems can also be used to identify com- 
pounds that bind (or mimic) NUCP as well as assess the 60 
altered activity associated with such binding in living cells. 
One tool of particular interest for such assays is green 
fluorescent protein which is described, inter alia, in U.S. Pat. 
No. 5,625,048, herein incorporated by reference. Cells that 
may be used in such cellular assays include, but are not 65 
limited to, leukocytes, or cell lines derived from leukocytes, 
lymphocytes, stem cells, including embryonic stem cells, 



and the like. In addition, expression host cells (e.g., B95 
cells, COS cells, CHO cells, OMK cells, fibroblasts, Sf9 
cells) genetically engineered to express a functional NUCP 
of interest and to respond to activation by the test, or natural, 
ligand, as measured by a chemical or phenotypic change, or 
induction of another host cell gene, can be used as an end 
point in the assay. 

Compounds identified via assays such as those described 
herein may be useful, for example, in elucidating the bio- 
logical function of NUCR Such compounds can be admin- 
istered to a patient at therapeutically effective doses to treat 
any of a variety of physiological or mental disorders. A 
therapeutically effective dose refers to that amount of the 
compound sufficient to result in any amelioration, 
impediment, prevention, or alteration of any biological 
symptom. 

Toxicity and therapeutic efficacy of such compounds can 
be determined by standard pharmaceutical procedures in cell 
cultures or experimental animals, e.g., for determining the 
LD 50 (the dose lethal to 50% of the population) and the ED 50 
(the dose therapeutically effective in 50% of the population). 
The dose ratio between toxic and therapeutic effects is the 
therapeutic index and it can be expressed as the ratio 
LD 50 /ED 50 . Compounds that exhibit large therapeutic indi- 
ces are preferred. While compounds that exhibit toxic side 
effects may be used, care should be taken to design a 
delivery system that targets such compounds to the site of 
affected tissue in order to minimize potential damage to 
uninfected cells and, thereby, reduce side effects. 

The data obtained from the cell culture assays and animal 
studies can be used in formulating a range of dosage for use 
in humans. The dosage of such compounds lies preferably 
within a range of circulating concentrations that include the 
ED 50 with little or no toxicity. The dosage may vary within 
this range depending upon the dosage form employed and 
the route of administration utilized. For any compound used 
in the method of the invention, the therapeutically effective 
dose can be estimated initially from cell culture assays. A 
dose may be formulated in animal models to achieve a 
circulating plasma concentration range that includes the IC 50 
(i.e., the concentration of the test compound which achieves 
a half-maximal inhibition of symptoms) as determined in 
cell culture. Such information can be used to more accu- 
rately determine useful doses in humans. Levels in plasma 
may be measured, for example, by high performance liquid 
chromatography. 

Pharmaceutical compositions for use in accordance with 
the present invention may be formulated in conventional 
manner using one or more physiologically acceptable car- 
riers or excipients. Thus, the compounds and their physi- 
ologically acceptable salts and solvates may be formulated 
for administration by inhalation or insufflation (either 
through the mouth or the nose) or oral, buccal, parenteral, 
intracranial, intrathecal, or rectal administration. 

For oral administration, the pharmaceutical compositions 
may take the form of, for example, tablets or capsules 
prepared by conventional means with pharmaceutically 
acceptable excipients such as binding agents (e.g., pregela- 
tinised maize starch, polyvinylpyrrolidone or hydroxypropyl 
methylcellulose); fillers (e.g., lactose, microcrystalline cel- 
lulose or calcium hydrogen phosphate); lubricants (e.g., 
magnesium stearate, talc or silica); disintegrants (e.g., potato 
starch or sodium starch glycolate); or wetting agents (e.g., 
sodium lauryl sulphate). The tablets may be coated by 
methods well known in the art. Liquid preparations for oral 
administration may take the form of, for example, solutions, 
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syrups or suspensions, or they may be presented as a dry 
product for constitution with water or other suitable vehicle 
before use. Such liquid preparations may be prepared by 
conventional means with pharmaceutically acceptable addi- 
tives such as suspending agents (e.g., sorbitol syrup, cellu- 
lose derivatives or hydrogenated edible fats); emulsifying 
agents (e.g., lecithin or acacia); non-aqueous vehicles (e.g., 
almond oil, oily esters, ethyl alcohol or fractionated veg- 
etable oils); and preservatives (e.g., methyl or propyl-p- 
hydroxybenzoates or sorbic acid). The preparations may 
also contain buffer salts, flavoring, coloring and sweetening 
agents as appropriate. 

Preparations for oral administration may be suitably for- 
mulated to give controlled release of the active compound. 

For buccal administration the compositions may take the 
form of tablets or lozenges formulated in conventional 
manner. 

For administration by inhalation, the compounds for use 
according to the present invention are conveniently deliv- 
ered in the form of an aerosol spray presentation from 
pressurized packs or a nebulizer, with the use of a suitable 
propellant, e.g., dichlorodifluorom etha ne, 
trichlorofluoromethane, dichlorotetrafluoroethane, carbon 
dioxide or other suitable gas. In the case of a pressurized 
aerosol the dosage unit may be determined by providing a 
valve to deliver a metered amount. Capsules and cartridges 
of e.g. gelatin for use in an inhaler or insufflator may be 
formulated containing a powder mix of the compound and a 
suitable powder base such as lactose or starch. 

The compounds may be formulated for parenteral admin- 
istration by injection, e.g., by bolus injection or continuous 
infusion. Formulations for injection may be presented in unit 
dosage form, e.g., in ampoules or in multi-dose containers, 
with an added preservative. The compositions may take such 
forms as suspensions, solutions or emulsions in oily or 
aqueous vehicles, and may contain formulatory agents such 
as suspending, stabilizing and/or dispersing agents. 
Alternatively, the active ingredient may be in powder form 
for constitution with a suitable vehicle, e.g., sterile pyrogen - 
free water, before use. 

The compounds may also be formulated in rectal com- 
positions such as suppositories or retention enemas, e.g., 
containing conventional suppository bases such as cocoa 
butter or other glycerides. 

In addition to the formulations described previously, the 
compounds may also be formulated as a depot preparation. 
Such long acting formulations may be administered by 
implantation (for example subcutaneously or 
intramuscularly) or by intramuscular injection. Thus, for 
example, the compounds may be formulated with suitable 
polymeric or hydrophobic materials (for example as an 
emulsion in an acceptable oil) or ion exchange resins, or as 
sparingly soluble derivatives, for example, as a sparingly 
soluble salt. 

The compositions may, if desired, be presented in a pack 
or dispenser device which may contain one or more unit 
dosage forms containing the active ingredient. The pack may 
for example comprise metal or plastic foil, such as a blister 
pack. The pack or dispenser device may be accompanied by 
instructions for administration. 

5.5.1. In Vitro Screening Assays for Compounds that Bind 
to at NUCP 

In vitro systems may be designed to identify compounds 
capable of interacting with (e.g., binding to) or mimicking a 
NUCP. The compounds identified can be useful, for 
example, in modulating the activity of wild type and/or 
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mutant NUCP; can be useful in elaborating the biological 
function of NUCP; can be utilized in screens for identifying 
compounds that disrupt normal NUCP interactions; or may 
themselves disrupt or activate such interactions. 

5 The principle of the assays used to identify compounds 
that bind to a NUCP, or NUCP ligands/accessory molecules, 
involves preparing a reaction mixture of NUCP and the test 
compound under conditions and for a time sufficient to allow 
the two components to interact and bind, thus forming a 

10 complex which can be removed and/or detected in the 
reaction mixture. The NUCP species used can vary depend- 
ing upon the goal of the screening assay. For example, where 
agonists of a natural NUCP accessory molecule or ligand are 
desired, a full length NUCP, or a soluble truncated NUCP, a 

15 NUCP peptide, or NUCP fusion protein containing one or 
more NUCP domains fused to a protein or polypeptide that 
affords advantages in the assay system (e.g., labeling, iso- 
lation of the resulting complex, etc.) can be utilized. Where 
compounds that direcdy interact with a NUCP are sought, 

20 peptides corresponding to NUCP and fusion proteins con- 
taining a NUCP, or a portion thereof, can be used. 

The screening assays can be conducted in a variety of 
ways. For example, one method to conduct such an assay 
would involve anchoring a NUCP, NUCP polypeptide, 

25 NUCP peptide, or fusion protein thereof, or the test sub- 
stance onto a solid phase and detecting NUCP/test com- 
pound complexes anchored on the solid phase at the end of 
the reaction. In one embodiment of such a method, the 
NUCP reactant may be anchored onto a solid surface, and 

30 the test compound, which is not anchored, may be labeled, 
either directly or indirectly. 

In practice, microtiter plates may conveniently be utilized 
as the solid phase. The anchored component may be immo- 

35 bilized by non-covalent or covalent attachments. Non- 
covalent attachment may be accomplished by simply coating 
the solid surface with a solution of the protein and drying. 
Alternatively, an immobilized antibody, preferably a mono- 
clonal antibody, specific for the protein to be immobilized 

4Q may be used to anchor the protein to the solid surface. The 
surfaces may be prepared in advance and stored. 

In order to conduct the assay, the nonimmobilized com- 
ponent is added to the coated surface containing the 
anchored component. After the reaction is complete, unre- 

45 acted components are removed (e.g., by washing) under 
conditions such that any complexes formed will remain 
immobilized on the solid surface. The detection of com- 
plexes anchored on the solid surface can be accomplished in 
a number of ways. Where the previously nonimmobilized 

50 component is pre -labeled, the detection of label immobilized 
on the surface indicates that complexes were formed. Where 
the previously nonimmobilized component is not pre- 
labeled, an indirect label can be used to detect complexes 
anchored on the surface; e.g., using a labeled antibody 

55 specific for the previously nonimmobilized component (the 
antibody, in turn, may be directly labeled or indirectly 
labeled with a labeled anti-Ig antibody). 

Alternatively, a reaction can be conducted in a liquid 
phase, the reaction products separated from unreacted 

60 components, and complexes detected; e.g., using an immo- 
bilized antibody specific for a NUCP, NUCP polypeptide, 
peptide or fusion protein, or the test compound to anchor any 
complexes formed in solution, and a labeled antibody spe- 
cific for the other component of the possible complex to 

65 detect anchored complexes. 

Alternatively, cell-based assays can be used to identify 
compounds that interact with a NUCP. To this end, cell lines 
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that express a NUCP, or cell lines (e.g., COS cells, CHO 
cells, fibroblasts, etc.) that have been genetically engineered 
to express a NUCP or a NUCP ligand/accessory molecule 
(e.g., by transfection or transduction of NUCP DNA, etc.) 
can be used. Interaction of the test compound with, for 
example, NUCP ligand expressed by the host cell can be 
determined by comparison or competition with native 
NUCP. 

5.5.2. Assays for Compounds that Interfere with NUCP 
Receptor/Intracellular or NUCP/Transmembrane Macro- 
molecule Interaction 

Macromolecules that interact with a NUCP are referred 
to, for purposes of this discussion, as "binding partners". 
These binding partners are likely to be involved in NUCP 
mediated biological pathways. Therefore, it is desirable to 
identify compounds that interfere with or disrupt the inter- 
action of such binding partners which may be useful in 
regulating or augmenting NUCP activity in the body and/or 
controlling disorders associated with NUCP activity (or a 
deficiency thereof). 

The basic principle of the assay systems used to identify 
compounds that interfere with the interaction between 
NUCP, or NUCP polypeptides, peptides or fusion proteins as 
described above (collectively, the NUCP moiety), and its 
binding partner or partners involves preparing a reaction 
mixture containing the NUCP moiety and the binding part- 
ner under conditions and for a time sufficient to allow the 
two to interact and bind, thus forming a complex. In order 
to test a compound for inhibitory activity, the reaction 
mixture is prepared in the presence and absence of the test 
compound. The test compound may be initially included in 
the reaction mixture, or may be added at a time subsequent 
to the addition of the NUCP moiety and its binding partner. 
Control reaction mixtures are incubated without the test 
compound or with a placebo. The formation of any com- 
plexes between the NUCP moiety and the binding partner is 
then detected. The formation of a complex in the control 
reaction, but not in the reaction mixture containing the test 
compound, indicates that the compound interferes with the 
interaction of the NUCP moiety and the interactive binding 
partner. Additionally, complex formation within reaction 
mixtures containing the test compound and normal NUCP 
may also be compared to complex formation within reaction 
mixtures containing the test compound and a mutant NUCP. 
This comparison may be important in those cases wherein it 
is desirable to identify compounds that specifically disrupt 
interactions of mutant, or mutated, NUCPs but not normal 
NUCPs. 

The assay for compounds that interfere with the interac- 
tion of the NUCP moiety and its binding partners can be 
conducted in a heterogeneous or homogeneous format. 
Heterogeneous assays involve anchoring either the NUCP 
moiety or the binding partner onto a solid phase and detect- 
ing complexes anchored on the solid phase at the end of the 
reaction. In homogeneous assays, the entire reaction is 
carried out in a liquid phase. In either approach, the order of 
addition of reactants can be varied to obtain different infor- 
mation about the compounds being tested. For example, test 
compounds that interfere with the interaction by competition 
can be identified by conducting the reaction in the presence 
of the test substance; i.e., by adding the test substance to the 
reaction mixture prior to, or simultaneously with, the NUCP 
moiety and interactive binding partner. Alternatively, test 
compounds that disrupt preformed complexes, e.g. com- 
pounds with higher binding constants that displace one of 
the components from the complex, can be tested by adding 
the test compound to the reaction mixture after complexes 
have been formed. The various formats are described briefly 
below. 
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In a heterogeneous assay system, either the NUCP moiety 
or an interactive binding partner, is anchored onto a solid 
surface, while the non-anchored species is labeled, either 
directly or indirectly. In practice, microliter plates are con- 

5 veniently utilized. The anchored species may be immobi- 
lized by non-covalent or covalent attachments. Non- 
covalent attachment may be accomplished simply by coating 
the solid surface with a solution of the NUCP moiety or 
binding partner and drying. Alternatively, an immobilized 

10 antibody specific for the species to be anchored may be used 
to anchor the species to the solid surface. The surfaces may 
be prepared in advance and stored. 

In order to conduct the assay, the partner of the immobi- 
lized species is exposed to the coated surface with or without 

15 the test compound. After the reaction is complete, unreacted 
components are removed (e.g., by washing) and any com- 
plexes formed will remain immobilized on the solid surface. 
The detection of complexes anchored on the solid surface 
can be accomplished in a number of ways. Where the 

20 non-immobilized species is pre-labeled, the detection of 
label immobilized on the surface indicates that complexes 
were formed. Where the non-immobilized species is not 
pre-labeled, an indirect label can be used to detect com- 
plexes anchored on the surface; e.g., using a labeled anti- 

25 body specific for the initially non-immobilized species (the 
antibody, in turn, may be directly labeled or indirectly 
labeled with a labeled anti-Ig antibody). Depending upon the 
order of addition of reaction components, test compounds 
which inhibit complex formation or which disrupt pre- 

30 formed complexes can be detected. 

Alternatively, the reaction can be conducted in a liquid 
phase in the presence or absence of the test compound, the 
reaction products separated from unreacted components, and 
complexes detected; e.g., using an immobilized antibody 

35 specific for one of the binding components to anchor any 
complexes formed in solution, and a labeled antibody spe- 
cific for the other partner to detect anchored complexes. 
Again, depending upon the order of addition of reactants to 
the liquid phase, test compounds which inhibit complex or 

40 which disrupt preformed complexes can be identified. 

In an alternate embodiment of the invention, a homoge- 
neous assay can be used. In this approach, a preformed 
complex of the NUCP moiety and an interactive binding 
partner is prepared in which either the NUCP moiety or its 

45 binding partners is labeled, but the signal generated by the 
label is quenched due to formation of the complex (see, e.g., 
U.S. Pat. No. 4,190,496 by Rubenstein which utilizes this 
approach for immunoassays). The addition of a test sub- 
stance that competes with and displaces one of the species 

50 from the preformed complex will result in the generation of 
a signal above background. In this way, test substances 
which disrupt NUCP/intra cellular binding partner interac- 
tion can be identified. 

In a particular embodiment, a NUCP fusion can be 

55 prepared for immobilization. For example, NUCP or a 
peptide fragment can be fused to a glutathione-S-transferase 
(GST) gene using a fusion vector, such as pGEX-5X-l, in 
such a manner that its binding activity is maintained in the 
resulting fusion protein. The interactive binding partner can 

60 be purified and used to raise a monoclonal antibody, using 
methods routinely practiced in the art and/or described 
above. This antibody can be labeled with the radioactive 
isotope 125 I, for example, by methods routinely practiced in 
the art. In a heterogeneous assay, e.g., the GST-NUCP fusion 

65 protein can be anchored to glutathione-agarose beads. The 
interactive binding partner can then be added in the presence 
or absence of the test compound in a manner that allows 
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interaction and binding to occur. At the end of the reaction 
period, unbound material can be washed away, and the 
labeled monoclonal antibody can be added to the system and 
allowed to bind to the complexed components. The interac- 
tion between the NUCP moiety and the interactive binding 5 
partner can be detected by measuring the amount of radio- 
activity that remains associated with the glutathione -agarose 
beads. A successful inhibition of the interaction by the test 
compound will result in a decrease in measured radioactiv- 
ity. 10 

Alternatively, the GST-NUCP moiety fusion protein and 
the interactive binding partner can be mixed together in 
liquid in the absence of the solid glutathione-agarose beads. 
The test compound can be added either during or after the 
species are allowed to interact. This mixture can then be 15 
added to the glutathione- agarose beads and unbound mate- 
rial is washed away. Again the extent of inhibition of the 
NUCP moiety/binding partner interaction can be detected by 
adding the labeled antibody and measuring the radioactivity 
associated with the beads. 20 

In another embodiment of the invention, these same 
techniques can be employed using peptide fragments that 
correspond to the binding domain(s) of the NUCP moiety 
and/or the interactive or binding partner (in cases where the 
binding partner is a protein), in place of one or both of the 25 
full length proteins. Any number of methods routinely 
practiced in the art can be used to identify and isolate the 
binding sites. These methods include, but are not limited to, 
mutagenesis of the gene encoding one of the proteins and 
screening for disruption of binding in a 
co-immunoprecipitation assay. Compensatory mutations in 
the gene encoding the second species in the complex can 
then be selected. Sequence analysis of the genes encoding 
the respective proteins will reveal the mutations that corre- 
spond to the region of the protein involved in interactive 
binding. Alternatively, one protein can be anchored to a solid 
surface using methods described above, and allowed to 
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interact with and bind to its labeled binding partner, which 
has been treated with a proteolytic enzyme, such as trypsin. 
After washing, a relatively short, labeled peptide comprising 
the binding domain may remain associated with the solid 
material, which can be isolated and identified by amino acid 
sequencing. Also, once the gene coding for the intracellular 
binding partner is obtained, short gene segments can be 
engineered to express peptide fragments of the protein, 
which can then be tested for binding activity and purified or 
synthesized. 

For example, and not by way of limitation, the NUCP 
moiety can be anchored to a solid material as described, 
above, by making a GST-NUCP moiety fusion protein and 
allowing it to bind to glutathione agarose beads. The inter- 
active binding partner can be labeled with a radioactive 
isotope, such as 35 S, and cleaved with a proteolytic enzyme 
such as trypsin. Cleavage products can then be added to the 
anchored GST-NUCP moiety fusion protein and allowed to 
bind. After washing away unbound peptides, labeled bound 
material, representing the intracellular binding partner bind- 
ing domain, can be eluted, purified, and analyzed for amino 
acid sequence by well-known methods. Peptides so identi- 
fied can be produced synthetically or fused to appropriate 
facilitative proteins using recombinant DNA technology. 

The present invention is not to be limited in scope by the 
specific embodiments described herein, which are intended 
as single illustrations of individual aspects of the invention, 
and functionally equivalent methods and components are 
within the scope of the invention. Indeed, various modifi- 
cations of the invention, in addition to those shown and 
described herein will become apparent to those skilled in the 
art from the foregoing description and accompanying draw- 
ings. Such modifications are intended to fall within the scope 
of the appended claims. All patents, patent applications, and 
publications cited herein are hereby incorporated by refer- 



SEQUENCE LISTING 

<160> NUMBER OP SEQ ID NOS : 4 

<210> SEQ ID NO 1 

<211> LENGTH: 876 

<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 1 



atgtcagccc 


tcaactggaa 


gccgtttgtg tacggggggc 


tggcctccat 


c 


actgctgag 


60 


tgcggtacat 


ttccaattga 


tttaaccaag acacggctcc 


agattcaagg 


c 


cagacgaat 


120 


gatgcaaaat 


ttaaggaaat 


tagataccga ggaatgttgc 


acgcattagt 


g 


aggataggc 


180 


agagaagaag 


ggctgaaagc 


actctactcg gggattgccc 


ccgcgatgtt 


a 


cgccaggca 


240 


tcctatggca 


ccatcaagat 


aggcacttac cagagcttga 


agcgactatt 


c 


attgaacgc 


300 


ccagaagatg 


aaactctacc 


gataaatgtg atatgtggaa 


ttctgtctgg 


a 


gtcatatct 


360 


tcaaccattg 


ctaatccaac 


tgatgttttg aaaattcgga 


tgcaagcgca 


a 


agcaacacc 


420 


attcaaggag 


gaatgatagg 


caacttcatg aacatttacc 


agcaagaggg 


g 


acaagagga 


480 


ctgtggaagg 


gtgtgtccct 


tactgcgcag agggctgcta 


ttgttgttgg 


t 


gtggagctg 


540 


ccggtctatg 


acatcaccaa 


gaagcatctt attctctcag 


gcctgatggg 


a 


gacactgtg 


600 
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-continued 



tatacccact tcctctcaag cttcacctgt ggtctggcag gggccctggc c tcaaaccct 660 

gttgatgttg tgaggacacg tatgatgaat cagagagtgc ttcgagatgg c agatgttct 720 

ggctacacag gaaccctgga ttgcttgtta cagacatgga agaatgaagg g ttttttgct 780 

ctctataaag gcttttggcc aaattggttg agacttggtc cttggaatat c attttcttt 840 

gtgacatacg agcagttgaa gaaattggat ttgtga 876 



<210> SEQ ID NO 2 

<211> LENGTH: 291 

<212> TYPE: PRT 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 2 

Met Ser Ala Leu Asn Trp Lys Pro Phe Val T yr Gly Gly Leu Ala Ser 
15 10 15 

lie Thr Ala Glu Cys Gly Thr Phe Pro lie A ep Leu Thr Lye Thr Arg 
20 25 30 

Leu Gin lie Gin Gly Gin Thr Asn Asp Ala L ys Phe Lys Glu lie Arg 
35 40 45 

Tyr Arg Gly Met Leu His Ala Leu Val Arg I le Gly Arg Glu Glu Gly 
50 55 60 

Leu Lys Ala Leu Tyr Ser Gly lie Ala Pro A la Met Leu Arg Gin Ala 
65 70 75 80 

Ser Tyr Gly Thr lie Lys He Gly Thr Tyr G In Ser Leu Lye Arg Leu 
85 90 95 

Phe He Glu Arg Pro Glu Asp Glu Thr Leu P ro He Asn Val He Cys 
100 105 110 

Gly He Leu Ser Gly Val He Ser Ser Thr I le Ala Asn Pro Thr Asp 
115 120 125 

Val Leu Lys He Arg Met Gin Ala Gin Ser A sn Thr He Gin Gly Gly 
130 135 140 

Met He Gly Asn Phe Met Asn He Tyr Gin G In Glu Gly Thr Arg Gly 
145 150 155 160 

Leu Trp Lys Gly Val Ser Leu Thr Ala Gin A rg Ala Ala He Val Val 
165 170 175 

Gly Val Glu Leu Pro Val Tyr Asp He Thr L ys Lys His Leu He Leu 
180 185 190 

Ser Gly Leu Met Gly Asp Thr Val Tyr Thr H is Phe Leu Ser Ser Phe 
195 200 205 

Thr Cys Gly Leu Ala Gly Ala Leu Ala Ser A sn Pro Val Asp Val Val 
210 215 220 

Arg Thr Arg Met Met Asn Gin Arg Val Leu A rg Asp Gly Arg Cys Ser 
225 230 235 240 

Gly Tyr Thr Gly Thr Leu Asp Cys Leu Leu G In Thr Trp Lys Asn Glu 
245 250 255 

Gly Phe Phe Ala Leu Tyr Lys Gly Phe Trp P ro Asn Trp Leu Arg Leu 
260 265 270 

Gly Pro Trp Asn He He Phe Phe Val Thr T yr Glu Gin Leu Lys Lys 
275 280 285 

Leu Asp Leu 
290 



<210> SEQ ID NO 3 
<211> LENGTH: 8 82 



US 6,403,784 Bl 
33 34 



-continued 



<212> TYPE: DNA 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 3 

atgtcagccc tcaactggaa gccgtttgtg tacggggggc tggcctccat c actgctgag 60 

tgcggtacat ttccaattga tttaaccaag acacggctcc agattcaagg c cagacgaat 120 

gatgcaaaat ttaaggaaat tagataccga ggaatgttgc acgcattagt g aggataggc 180 

agagaagaag ggctgaaagc actctactcg gggattgccc ccgcgatgtt a cgccaggca 240 

tcctatggca ccatcaagat aggcacttac cagagcttga agcgactatt c attgaacgc 300 

ccagaagatg aaactctacc gataaatgtg atatgtggaa ttctgtctgg a gtcatatct 360 

tcaaccattg ctaatccaac tgatgttttg aaaattcgga tgcaagcgca a agcaacacc 420 

attcaaggag gaatgatagg caacttcatg aacatttacc agcaagaggg g acaagagga 4 80 

ctgtggaagg gtgtgtccct tactgcgcag agggctgcta ttgttgttgg t gtggagctg 540 

ccggtctatg acatcaccaa gaagcatctt attctctcag gcctgatggg a gacactgtg 600 

tatacccact tcctctcaag cttcacctgt ggtctggcag gggccctggc c tcaaaccct 660 

gttgatgttg tgaggacacg tatgatgaat cagagagtgc ttcgagatgg c agatgttct 720 

ggctacacag gaaccctgga ttgcttgtta cagcttacag tgctggaaag t ttttccacc 780 

acagcaaagc cacaaaagct tatcagcgta gatgccatct cagaagaggc t gataccagg 840 

ggatttacat atctcagctg tgatctttct gctccaagct ga 882 

<210> SEQ ID NO 4 

<211> LENGTH: 293 

<212> TYPE: PRT 

<213> ORGANISM: Homo sapiens 

<400> SEQUENCE: 4 

Met Ser Ala Leu Asn Trp Lys Pro Phe Val T yr GXy Gly Leu Ala Ser 
15 10 15 

He Thr Ala Glu Cys Gly Thr Phe Pro He A sp Leu Thr Lys Thr Arg 
20 25 30 

Leu Gin He Gin Gly Gin Thr Asn Asp Ala L ys Phe Lys Glu He Arg 
35 40 45 

Tyr Arg Gly Met Leu His Ala Leu Val Arg I le Gly Arg Glu Glu Gly 
50 55 60 

Leu Lys Ala Leu Tyr Ser Gly He Ala Pro A la Met Leu Arg Gin Ala 
65 70 75 80 

Ser Tyr Gly Thr He Lys He Gly Thr Tyr G In Ser Leu Lys Arg Leu 
85 90 95 

Phe He Glu Arg Pro Glu Asp Glu Thr Leu P ro He Asn Val He Cys 
100 105 110 

Gly He Leu Ser Gly Val lie Ser Ser Thr I le Ala Asn Pro Thr Asp 
115 120 125 

Val Leu Lys He Arg Met Gin Ala Gin Ser A sn Thr He Gin Gly Gly 
130 135 140 

Met He Gly Asn Phe Met Asn He Tyr Gin G In Glu Gly Thr Arg Gly 
145 150 155 160 

Leu Trp Lys Gly Val Ser Leu Thr Ala Gin A rg Ala Ala He Val Val 
165 170 175 

Gly Val Glu Leu Pro Val Tyr Asp He Thr L ys Lys His Leu He Leu 
180 185 190 
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-continued 



Ser Gly 


Leu 
195 


Met 


Gly 


Asp 


Thr 


Val 
200 


Tyr 


Thr H is Phe Leu Ser Ser Phe 
205 


Thr Cys 
210 


Gly 


Leu 


Ala 


Gly 


Ala 
215 


Leu 


Ala 


Ser A sn Pro Val Asp Val Val 
220 


Arg Thr 
225 


Arg 


Met 


Met 


Asn 
230 


Gin 


Arg 


Val 


Leu A rg Asp Gly Arg Cys Ser 
235 240 


Gly Tyr 


Thr 


Gly 


Thr 
245 


Leu 


Asp 


Cys 


Leu 


Leu G ln Leu Thr Val Leu Glu 
250 255 


Ser Phe 


Ser 


Thr 
260 


Thr 


Ala 


Lys 


Pro 


Gin 
265 


Lys L eu He Ser Val Asp Ala 
270 


lie Ser 


Glu 
275 


Glu 


Ala 


Asp 


Thr 


Arg 
280 


Gly 


Phe T hr Tyr Leu Ser Cys Asp 
285 


Leu Ser 
290 


Ala 


Pro 


Ser 













What is claimed is: 

1. An isolated nucleic acid molecule comprising at least 
24 contiguous bases of a human nucleotide sequence from 
the uncoupling protein polynucleotide sequence described in 25 
SEQ ID NO: 1. 

2. An isolated nucleic acid molecule comprising a human 
nucleotide sequence that: 

(a) encodes the amino acid sequence shown in SEQ ID 
NO: 2; and 30 

(b) hybridizes under highly stringent conditions to the 
nucleotide sequence of SEQ ID NO: 1 or the comple- 
ment thereof. 



3. An isolated nucleic acid molecule comprising a human 
nucleotide sequence encoding the amino acid sequence of 
SEQ ID NO:2. 

4. An isolated nucleic acid molecule comprising the 
human nucleotide sequence of the uncoupling protein poly- 
nucleotide sequence described in SEQ ID NO: 3. 

5. An isolated nucleic acid molecule comprising a nucle- 
otide sequence encoding the amino acid sequence of SEQ ID 
NO: 4, 

* * * * * 
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